Quantcast

How many shards to set when ~2TB data need to be indexed?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

How many shards to set when ~2TB data need to be indexed?

Jingang Wang
Hi there,

I mean to index ~1.9 TB text data using elasticsearch, the default number of shards is 5, would it meet the need?
I could afford about 10 machine to form a cluster.
Thanks for your help in advance.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How many shards to set when ~2TB data need to be indexed?

Clinton Gormley-2
Hiya
>
> I mean to index ~1.9 TB text data using elasticsearch, the default
> number of shards is 5, would it meet the need?

It depends :)

> I could afford about 10 machine to form a cluster.
> Thanks for your help in advance.

Really, it depends.  On:
1) your data
2) how you index it
3) how you query it
4) your hardware
5) your expectations

Don't forget that you will probably have at least double that amount,
because for each primary shard you want one or more replica shards to
ensure that no data gets lost, and to improve search throughput.

The best approach would be:
 * on the type of hardware that you intend to use in production,
 * create an index with a single primary shard, no replicas
 * index your data into that shard
 * run typical queries under typical load
 * measure

At some point, the shard will stop performing well enough to meet your
expectations.  That's the shard limit.  Now you know how big to make
your index

clint


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How many shards to set when ~2TB data need to be indexed?

Jingang Wang
Hi Clinton,

Thanks for your reply. it gives me a lot. I am a newbie in ES, it's so sad that the # of shard could not be changed once be set.

On Tuesday, February 26, 2013 5:51:41 PM UTC+8, Clinton Gormley wrote:
Hiya
>
> I mean to index ~1.9 TB text data using elasticsearch, the default
> number of shards is 5, would it meet the need?

It depends :)

> I could afford about 10 machine to form a cluster.
> Thanks for your help in advance.

Really, it depends.  On:
1) your data
2) how you index it
3) how you query it
4) your hardware
5) your expectations

Don't forget that you will probably have at least double that amount,
because for each primary shard you want one or more replica shards to
ensure that no data gets lost, and to improve search throughput.

The best approach would be:
 * on the type of hardware that you intend to use in production,
 * create an index with a single primary shard, no replicas
 * index your data into that shard
 * run typical queries under typical load
 * measure

At some point, the shard will stop performing well enough to meet your
expectations.  That's the shard limit.  Now you know how big to make
your index

clint


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How many shards to set when ~2TB data need to be indexed?

dadoonet
But you can create new index with new # of shards and have an alias on top of all your indices.


--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 26 févr. 2013 à 11:10, Jingang Wang <[hidden email]> a écrit :

Hi Clinton,

Thanks for your reply. it gives me a lot. I am a newbie in ES, it's so sad that the # of shard could not be changed once be set.

On Tuesday, February 26, 2013 5:51:41 PM UTC+8, Clinton Gormley wrote:
Hiya
>
> I mean to index ~1.9 TB text data using elasticsearch, the default
> number of shards is 5, would it meet the need?

It depends :)

> I could afford about 10 machine to form a cluster.
> Thanks for your help in advance.

Really, it depends.  On:
1) your data
2) how you index it
3) how you query it
4) your hardware
5) your expectations

Don't forget that you will probably have at least double that amount,
because for each primary shard you want one or more replica shards to
ensure that no data gets lost, and to improve search throughput.

The best approach would be:
 * on the type of hardware that you intend to use in production,
 * create an index with a single primary shard, no replicas
 * index your data into that shard
 * run typical queries under typical load
 * measure

At some point, the shard will stop performing well enough to meet your
expectations.  That's the shard limit.  Now you know how big to make
your index

clint


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How many shards to set when ~2TB data need to be indexed?

Clinton Gormley-2
In reply to this post by Jingang Wang
On Tue, 2013-02-26 at 02:10 -0800, Jingang Wang wrote:
> Hi Clinton,
>
>
> Thanks for your reply. it gives me a lot. I am a newbie in ES, it's so
> sad that the # of shard could not be changed once be set.

It's actually not as problematic in practice as it seems.  ES gives you
enormous flexibility because of the concept that querying one index with
5 shards is exactly equivalent to querying 5 indices with 1 shard each.

That means you can create new extra indices later, and you can use
aliases to make all of this transparent to your application.

clint



--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How many shards to set when ~2TB data need to be indexed?

Jingang Wang
In reply to this post by dadoonet
I don’t know it yet, so I could create multiple indices and query in all of them just like in one index.
it sounds great, thanks, David.

On Tuesday, February 26, 2013 6:52:10 PM UTC+8, David Pilato wrote:
But you can create new index with new # of shards and have an alias on top of all your indices.


--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 26 févr. 2013 à 11:10, Jingang Wang <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="1QjCUPw6p18J">bit...@...> a écrit :

Hi Clinton,

Thanks for your reply. it gives me a lot. I am a newbie in ES, it's so sad that the # of shard could not be changed once be set.

On Tuesday, February 26, 2013 5:51:41 PM UTC+8, Clinton Gormley wrote:
Hiya
>
> I mean to index ~1.9 TB text data using elasticsearch, the default
> number of shards is 5, would it meet the need?

It depends :)

> I could afford about 10 machine to form a cluster.
> Thanks for your help in advance.

Really, it depends.  On:
1) your data
2) how you index it
3) how you query it
4) your hardware
5) your expectations

Don't forget that you will probably have at least double that amount,
because for each primary shard you want one or more replica shards to
ensure that no data gets lost, and to improve search throughput.

The best approach would be:
 * on the type of hardware that you intend to use in production,
 * create an index with a single primary shard, no replicas
 * index your data into that shard
 * run typical queries under typical load
 * measure

At some point, the shard will stop performing well enough to meet your
expectations.  That's the shard limit.  Now you know how big to make
your index

clint


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="1QjCUPw6p18J">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Loading...