Considering scalability , is it right to keep a large number of primary shards at beginning?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Considering scalability , is it right to keep a large number of primary shards at beginning?

xinmeike

Hi,

I feel very confuse when deciding the number of primary shards at beginning.

As we know the number of shards and replicas can be defined per index at the time the index is created. After the index is created, we may change the number of replicas dynamically anytime but we cannot change the number of primary shards after-the-fact. Our ES project may be run as trial version at beginning an it is only 10 machines in cluster. However, if the project run in production environment,  the machine magnitude must increase and there will have 200 or more machines.

How can we decide shards number at beginning? Is it encourage to run 400 or more shards in 10 machines, will it reduce
the performance of cluster?


Thank you for reading and look forward to your suggestions.


--
Please update your bookmarks! We have moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a6eb715a-6543-4882-8635-81a0d22ca1d7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Considering scalability , is it right to keep a large number of primary shards at beginning?

Mark Walkom-2
You don't want 400 shards on 10 servers. You do want the ability to reindex to allow you to reshard to deal with this issue.
Logstash 1.5 can do this very easily, see this example.

However you probably don't want an index with 200 shards irrespective, you may want to take a look at your data structure and split things out.


PS - We're moving to https://discuss.elastic.co/, please join us there for any future discussions!



On 28 May 2015 at 12:46, <[hidden email]> wrote:

Hi,

I feel very confuse when deciding the number of primary shards at beginning.

As we know the number of shards and replicas can be defined per index at the time the index is created. After the index is created, we may change the number of replicas dynamically anytime but we cannot change the number of primary shards after-the-fact. Our ES project may be run as trial version at beginning an it is only 10 machines in cluster. However, if the project run in production environment,  the machine magnitude must increase and there will have 200 or more machines.

How can we decide shards number at beginning? Is it encourage to run 400 or more shards in 10 machines, will it reduce
the performance of cluster?


Thank you for reading and look forward to your suggestions.


--
Please update your bookmarks! We have moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a6eb715a-6543-4882-8635-81a0d22ca1d7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We have moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-PWRDC89CzDqSSmC%2BnjP4FD0o_F%2BDZjNOLzmDpXwDtaw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Considering scalability , is it right to keep a large number of primary shards at beginning?

xinmeike

Thanks for your answer~

May be when 10 servers extends to 100 servers it can work ,but I‘m afraid that if we do reindex to 100 servers  it may cost a long time and huge I/O resources.We need to stop the service for a long time and all the data need to transport from old index to new one.

Is there any easier way to horizontal expansion?

P.S. I cann’t visit  https://discuss.elastic.co/ today .It is blank all the time. (・ˇ_ˇ・)


在 2015年5月28日星期四 UTC+8下午3:05:13,Mark Walkom写道:
You don't want 400 shards on 10 servers. You do want the ability to reindex to allow you to reshard to deal with this issue.
Logstash 1.5 can do this very easily, see <a href="https://gist.github.com/markwalkom/8a7201e3f6ea4354ae06" target="_blank" rel="nofollow" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgist.github.com%2Fmarkwalkom%2F8a7201e3f6ea4354ae06\46sa\75D\46sntz\0751\46usg\75AFQjCNE1J3mT8QvKd3suG3jqyBKPZYCGng';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgist.github.com%2Fmarkwalkom%2F8a7201e3f6ea4354ae06\46sa\75D\46sntz\0751\46usg\75AFQjCNE1J3mT8QvKd3suG3jqyBKPZYCGng';return true;">this example.

However you probably don't want an index with 200 shards irrespective, you may want to take a look at your data structure and split things out.


PS - We're moving to <a href="https://discuss.elastic.co/" target="_blank" rel="nofollow" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fdiscuss.elastic.co%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNFguF42tUvBArAl-xFnDHHIi5v0cA';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fdiscuss.elastic.co%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNFguF42tUvBArAl-xFnDHHIi5v0cA';return true;">https://discuss.elastic.co/, please join us there for any future discussions!



On 28 May 2015 at 12:46, <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="4B5RRAbLrdsJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">xinm...@...> wrote:

Hi,

I feel very confuse when deciding the number of primary shards at beginning.

As we know the number of shards and replicas can be defined per index at the time the index is created. After the index is created, we may change the number of replicas dynamically anytime but we cannot change the number of primary shards after-the-fact. Our ES project may be run as trial version at beginning an it is only 10 machines in cluster. However, if the project run in production environment,  the machine magnitude must increase and there will have 200 or more machines.

How can we decide shards number at beginning? Is it encourage to run 400 or more shards in 10 machines, will it reduce
the performance of cluster?


Thank you for reading and look forward to your suggestions.


--
Please update your bookmarks! We have moved to <a href="https://discuss.elastic.co/" target="_blank" rel="nofollow" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fdiscuss.elastic.co%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNFguF42tUvBArAl-xFnDHHIi5v0cA';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fdiscuss.elastic.co%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNFguF42tUvBArAl-xFnDHHIi5v0cA';return true;">https://discuss.elastic.co/
---
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="4B5RRAbLrdsJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/a6eb715a-6543-4882-8635-81a0d22ca1d7%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/a6eb715a-6543-4882-8635-81a0d22ca1d7%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/a6eb715a-6543-4882-8635-81a0d22ca1d7%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/a6eb715a-6543-4882-8635-81a0d22ca1d7%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
Please update your bookmarks! We have moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/65cd6e3e-97f1-416e-8f97-e34a8dd7671e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Considering scalability , is it right to keep a large number of primary shards at beginning?

Mark Walkom-2
You don't need to stop everything to reindex, leverage aliases and you can do it live.

On 28 May 2015 at 18:57, <[hidden email]> wrote:

Thanks for your answer~

May be when 10 servers extends to 100 servers it can work ,but I‘m afraid that if we do reindex to 100 servers  it may cost a long time and huge I/O resources.We need to stop the service for a long time and all the data need to transport from old index to new one.

Is there any easier way to horizontal expansion?

P.S. I cann’t visit  https://discuss.elastic.co/ today .It is blank all the time. (・ˇ_ˇ・)


在 2015年5月28日星期四 UTC+8下午3:05:13,Mark Walkom写道:
You don't want 400 shards on 10 servers. You do want the ability to reindex to allow you to reshard to deal with this issue.
Logstash 1.5 can do this very easily, see this example.

However you probably don't want an index with 200 shards irrespective, you may want to take a look at your data structure and split things out.


PS - We're moving to https://discuss.elastic.co/, please join us there for any future discussions!



On 28 May 2015 at 12:46, <[hidden email]> wrote:

Hi,

I feel very confuse when deciding the number of primary shards at beginning.

As we know the number of shards and replicas can be defined per index at the time the index is created. After the index is created, we may change the number of replicas dynamically anytime but we cannot change the number of primary shards after-the-fact. Our ES project may be run as trial version at beginning an it is only 10 machines in cluster. However, if the project run in production environment,  the machine magnitude must increase and there will have 200 or more machines.

How can we decide shards number at beginning? Is it encourage to run 400 or more shards in 10 machines, will it reduce
the performance of cluster?


Thank you for reading and look forward to your suggestions.


--
Please update your bookmarks! We have moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a6eb715a-6543-4882-8635-81a0d22ca1d7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We have moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/65cd6e3e-97f1-416e-8f97-e34a8dd7671e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We have moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X89RqPb2atD%2B7VLWoEmbP5kyAcs%3D_Vs_TmOozGSEZCBWA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Considering scalability , is it right to keep a large number of primary shards at beginning?

xinmeike

Thank you for your answer! 

I would like to learn Logstash and take reindex into consideration.

best regards,
shinyke

在 2015年5月28日星期四 UTC+8下午5:21:00,Mark Walkom写道:
You don't need to stop everything to reindex, leverage aliases and you can do it live.

On 28 May 2015 at 18:57, <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="yeRqFmcy4a8J" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">xinm...@...> wrote:

Thanks for your answer~

May be when 10 servers extends to 100 servers it can work ,but I‘m afraid that if we do reindex to 100 servers  it may cost a long time and huge I/O resources.We need to stop the service for a long time and all the data need to transport from old index to new one.

Is there any easier way to horizontal expansion?

P.S. <a href="https://discuss.elastic.co/" rel="nofollow" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fdiscuss.elastic.co%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNFguF42tUvBArAl-xFnDHHIi5v0cA';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fdiscuss.elastic.co%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNFguF42tUvBArAl-xFnDHHIi5v0cA';return true;">I cann’t visit  https://discuss.elastic.co/ today .It is blank all the time. (・ˇ_ˇ・)


在 2015年5月28日星期四 UTC+8下午3:05:13,Mark Walkom写道:
You don't want 400 shards on 10 servers. You do want the ability to reindex to allow you to reshard to deal with this issue.
Logstash 1.5 can do this very easily, see <a href="https://gist.github.com/markwalkom/8a7201e3f6ea4354ae06" rel="nofollow" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgist.github.com%2Fmarkwalkom%2F8a7201e3f6ea4354ae06\46sa\75D\46sntz\0751\46usg\75AFQjCNE1J3mT8QvKd3suG3jqyBKPZYCGng';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgist.github.com%2Fmarkwalkom%2F8a7201e3f6ea4354ae06\46sa\75D\46sntz\0751\46usg\75AFQjCNE1J3mT8QvKd3suG3jqyBKPZYCGng';return true;">this example.

However you probably don't want an index with 200 shards irrespective, you may want to take a look at your data structure and split things out.


PS - We're moving to <a href="https://discuss.elastic.co/" rel="nofollow" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fdiscuss.elastic.co%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNFguF42tUvBArAl-xFnDHHIi5v0cA';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fdiscuss.elastic.co%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNFguF42tUvBArAl-xFnDHHIi5v0cA';return true;">https://discuss.elastic.co/, please join us there for any future discussions!



On 28 May 2015 at 12:46, <[hidden email]> wrote:

Hi,

I feel very confuse when deciding the number of primary shards at beginning.

As we know the number of shards and replicas can be defined per index at the time the index is created. After the index is created, we may change the number of replicas dynamically anytime but we cannot change the number of primary shards after-the-fact. Our ES project may be run as trial version at beginning an it is only 10 machines in cluster. However, if the project run in production environment,  the machine magnitude must increase and there will have 200 or more machines.

How can we decide shards number at beginning? Is it encourage to run 400 or more shards in 10 machines, will it reduce
the performance of cluster?


Thank you for reading and look forward to your suggestions.


--
Please update your bookmarks! We have moved to <a href="https://discuss.elastic.co/" rel="nofollow" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fdiscuss.elastic.co%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNFguF42tUvBArAl-xFnDHHIi5v0cA';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fdiscuss.elastic.co%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNFguF42tUvBArAl-xFnDHHIi5v0cA';return true;">https://discuss.elastic.co/
---
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/a6eb715a-6543-4882-8635-81a0d22ca1d7%40googlegroups.com?utm_medium=email&amp;utm_source=footer" rel="nofollow" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/a6eb715a-6543-4882-8635-81a0d22ca1d7%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/a6eb715a-6543-4882-8635-81a0d22ca1d7%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/a6eb715a-6543-4882-8635-81a0d22ca1d7%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" rel="nofollow" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
Please update your bookmarks! We have moved to <a href="https://discuss.elastic.co/" target="_blank" rel="nofollow" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fdiscuss.elastic.co%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNFguF42tUvBArAl-xFnDHHIi5v0cA';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fdiscuss.elastic.co%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNFguF42tUvBArAl-xFnDHHIi5v0cA';return true;">https://discuss.elastic.co/
---
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="yeRqFmcy4a8J" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/65cd6e3e-97f1-416e-8f97-e34a8dd7671e%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/65cd6e3e-97f1-416e-8f97-e34a8dd7671e%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/65cd6e3e-97f1-416e-8f97-e34a8dd7671e%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/65cd6e3e-97f1-416e-8f97-e34a8dd7671e%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
Please update your bookmarks! We have moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5591e786-fa10-4727-9bf0-ba8b10ad8786%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.