Looking for a suggestion to better organize our indices for performance

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Looking for a suggestion to better organize our indices for performance

Ron Sher
Hi,

We have a multi tenant SAAS application in which we keep data for all accounts of our clients (300 of them which we call services).
We keep data in monthly indices that grew to be about 700GB with 4.6 billion documents each month.
Each day we index a new account per day for each service.

Each index is built from 6 shards and we use 1 replica.

We're starting to have second thoughts of the structure of our indices and about proper use of routing.
Few things we contemplate:
  • Use routing according to service so that we will probably benefit from caching better.
  • Change the indices according to service + month so that we will query much less data, but will add many indices (now instead of 12 indices a year we will have 300x12 and growing when the number of clients grow).
Any thoughts/suggestions?

Thanks,
Ron

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/096f4d7b-2702-46d7-96d6-f746f2389623%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Looking for a suggestion to better organize our indices for performance

Mark Walkom-2
How many servers are in this cluster?

On 9 December 2014 at 14:36, Ron Sher <[hidden email]> wrote:
Hi,

We have a multi tenant SAAS application in which we keep data for all accounts of our clients (300 of them which we call services).
We keep data in monthly indices that grew to be about 700GB with 4.6 billion documents each month.
Each day we index a new account per day for each service.

Each index is built from 6 shards and we use 1 replica.

We're starting to have second thoughts of the structure of our indices and about proper use of routing.
Few things we contemplate:
  • Use routing according to service so that we will probably benefit from caching better.
  • Change the indices according to service + month so that we will query much less data, but will add many indices (now instead of 12 indices a year we will have 300x12 and growing when the number of clients grow).
Any thoughts/suggestions?

Thanks,
Ron

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/096f4d7b-2702-46d7-96d6-f746f2389623%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_TaWiZvpEpZ10_bMHhuOwSxpmBv%2BBhTNjPUHnKTA-Bgg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Looking for a suggestion to better organize our indices for performance

Ron Sher
we have 24 data nodes, 3 master nodes and 3 client nodes.
We use  m3.4xlarge for the data nodes 

On Tue, Dec 9, 2014 at 4:37 PM, Mark Walkom <[hidden email]> wrote:
How many servers are in this cluster?

On 9 December 2014 at 14:36, Ron Sher <[hidden email]> wrote:
Hi,

We have a multi tenant SAAS application in which we keep data for all accounts of our clients (300 of them which we call services).
We keep data in monthly indices that grew to be about 700GB with 4.6 billion documents each month.
Each day we index a new account per day for each service.

Each index is built from 6 shards and we use 1 replica.

We're starting to have second thoughts of the structure of our indices and about proper use of routing.
Few things we contemplate:
  • Use routing according to service so that we will probably benefit from caching better.
  • Change the indices according to service + month so that we will query much less data, but will add many indices (now instead of 12 indices a year we will have 300x12 and growing when the number of clients grow).
Any thoughts/suggestions?

Thanks,
Ron

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/096f4d7b-2702-46d7-96d6-f746f2389623%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/32uvCMR1kl4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_TaWiZvpEpZ10_bMHhuOwSxpmBv%2BBhTNjPUHnKTA-Bgg%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKHuyJo5nyDgUfur-%3DNw6O4Jg98duO1GQVr5zRf_tcqqg-kJ-w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Looking for a suggestion to better organize our indices for performance

Mark Walkom-2
Currently you have shards upwards of over 100GB, which is massive and probably causing you some issues. Ideally you should be aiming for a max shard size of 40-50GB, so increasing your shard count to 24 brings you under this level and also gives you room for growth on an index level.

Having a higher shard count also spreads the query load, and reduces the amount of thrashing (ie data transfer) if/when a node goes down.

On 9 December 2014 at 15:50, Ron Sher <[hidden email]> wrote:
we have 24 data nodes, 3 master nodes and 3 client nodes.
We use  m3.4xlarge for the data nodes 

On Tue, Dec 9, 2014 at 4:37 PM, Mark Walkom <[hidden email]> wrote:
How many servers are in this cluster?

On 9 December 2014 at 14:36, Ron Sher <[hidden email]> wrote:
Hi,

We have a multi tenant SAAS application in which we keep data for all accounts of our clients (300 of them which we call services).
We keep data in monthly indices that grew to be about 700GB with 4.6 billion documents each month.
Each day we index a new account per day for each service.

Each index is built from 6 shards and we use 1 replica.

We're starting to have second thoughts of the structure of our indices and about proper use of routing.
Few things we contemplate:
  • Use routing according to service so that we will probably benefit from caching better.
  • Change the indices according to service + month so that we will query much less data, but will add many indices (now instead of 12 indices a year we will have 300x12 and growing when the number of clients grow).
Any thoughts/suggestions?

Thanks,
Ron

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/096f4d7b-2702-46d7-96d6-f746f2389623%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/32uvCMR1kl4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_TaWiZvpEpZ10_bMHhuOwSxpmBv%2BBhTNjPUHnKTA-Bgg%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKHuyJo5nyDgUfur-%3DNw6O4Jg98duO1GQVr5zRf_tcqqg-kJ-w%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9K%2B6_MpOQN4nAg22pmgRHccdNzb2_RHZjYSWprF8Q7EA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Looking for a suggestion to better organize our indices for performance

Jilles van Gurp
Indeed increase your shard count. Also, you may want to consider using a routing parameter based on e.g. a tenant_id to ensure all queries related to a tenant only hit shards that actually have data for that tenant. Those two measures would reduce the size of each shard and the number of shards involved for each tenant. To increase query capacity, you could consider increasing the number of replicas as well this ways, you have more nodes that can handle query traffic for the same data.

Jilles

On Tuesday, December 9, 2014 3:56:06 PM UTC+1, Mark Walkom wrote:
Currently you have shards upwards of over 100GB, which is massive and probably causing you some issues. Ideally you should be aiming for a max shard size of 40-50GB, so increasing your shard count to 24 brings you under this level and also gives you room for growth on an index level.

Having a higher shard count also spreads the query load, and reduces the amount of thrashing (ie data transfer) if/when a node goes down.

On 9 December 2014 at 15:50, Ron Sher <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="QJsHW9O3q3sJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">ron....@...> wrote:
we have 24 data nodes, 3 master nodes and 3 client nodes.
We use  m3.4xlarge for the data nodes 

On Tue, Dec 9, 2014 at 4:37 PM, Mark Walkom <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="QJsHW9O3q3sJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">markw...@...> wrote:
How many servers are in this cluster?

On 9 December 2014 at 14:36, Ron Sher <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="QJsHW9O3q3sJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">ron....@...> wrote:
Hi,

We have a multi tenant SAAS application in which we keep data for all accounts of our clients (300 of them which we call services).
We keep data in monthly indices that grew to be about 700GB with 4.6 billion documents each month.
Each day we index a new account per day for each service.

Each index is built from 6 shards and we use 1 replica.

We're starting to have second thoughts of the structure of our indices and about proper use of routing.
Few things we contemplate:
  • Use routing according to service so that we will probably benefit from caching better.
  • Change the indices according to service + month so that we will query much less data, but will add many indices (now instead of 12 indices a year we will have 300x12 and growing when the number of clients grow).
Any thoughts/suggestions?

Thanks,
Ron

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="QJsHW9O3q3sJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/096f4d7b-2702-46d7-96d6-f746f2389623%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/096f4d7b-2702-46d7-96d6-f746f2389623%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/096f4d7b-2702-46d7-96d6-f746f2389623%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/096f4d7b-2702-46d7-96d6-f746f2389623%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit <a href="https://groups.google.com/d/topic/elasticsearch/32uvCMR1kl4/unsubscribe" target="_blank" onmousedown="this.href='https://groups.google.com/d/topic/elasticsearch/32uvCMR1kl4/unsubscribe';return true;" onclick="this.href='https://groups.google.com/d/topic/elasticsearch/32uvCMR1kl4/unsubscribe';return true;">https://groups.google.com/d/topic/elasticsearch/32uvCMR1kl4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="QJsHW9O3q3sJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_TaWiZvpEpZ10_bMHhuOwSxpmBv%2BBhTNjPUHnKTA-Bgg%40mail.gmail.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_TaWiZvpEpZ10_bMHhuOwSxpmBv%2BBhTNjPUHnKTA-Bgg%40mail.gmail.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_TaWiZvpEpZ10_bMHhuOwSxpmBv%2BBhTNjPUHnKTA-Bgg%40mail.gmail.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_TaWiZvpEpZ10_bMHhuOwSxpmBv%2BBhTNjPUHnKTA-Bgg%40mail.gmail.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="QJsHW9O3q3sJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/CAKHuyJo5nyDgUfur-%3DNw6O4Jg98duO1GQVr5zRf_tcqqg-kJ-w%40mail.gmail.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/CAKHuyJo5nyDgUfur-%3DNw6O4Jg98duO1GQVr5zRf_tcqqg-kJ-w%40mail.gmail.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/CAKHuyJo5nyDgUfur-%3DNw6O4Jg98duO1GQVr5zRf_tcqqg-kJ-w%40mail.gmail.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/CAKHuyJo5nyDgUfur-%3DNw6O4Jg98duO1GQVr5zRf_tcqqg-kJ-w%40mail.gmail.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/09794d0b-f26c-45f3-9b19-0b2efb1c0e31%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Looking for a suggestion to better organize our indices for performance

Ron Sher
BTW, we use c3.4xlarge and not as I said before

On Tuesday, December 9, 2014 5:52:55 PM UTC+2, Jilles van Gurp wrote:
Indeed increase your shard count. Also, you may want to consider using a routing parameter based on e.g. a tenant_id to ensure all queries related to a tenant only hit shards that actually have data for that tenant. Those two measures would reduce the size of each shard and the number of shards involved for each tenant. To increase query capacity, you could consider increasing the number of replicas as well this ways, you have more nodes that can handle query traffic for the same data.

Jilles

On Tuesday, December 9, 2014 3:56:06 PM UTC+1, Mark Walkom wrote:
Currently you have shards upwards of over 100GB, which is massive and probably causing you some issues. Ideally you should be aiming for a max shard size of 40-50GB, so increasing your shard count to 24 brings you under this level and also gives you room for growth on an index level.

Having a higher shard count also spreads the query load, and reduces the amount of thrashing (ie data transfer) if/when a node goes down.

On 9 December 2014 at 15:50, Ron Sher <[hidden email]> wrote:
we have 24 data nodes, 3 master nodes and 3 client nodes.
We use  m3.4xlarge for the data nodes 

On Tue, Dec 9, 2014 at 4:37 PM, Mark Walkom <[hidden email]> wrote:
How many servers are in this cluster?

On 9 December 2014 at 14:36, Ron Sher <[hidden email]> wrote:
Hi,

We have a multi tenant SAAS application in which we keep data for all accounts of our clients (300 of them which we call services).
We keep data in monthly indices that grew to be about 700GB with 4.6 billion documents each month.
Each day we index a new account per day for each service.

Each index is built from 6 shards and we use 1 replica.

We're starting to have second thoughts of the structure of our indices and about proper use of routing.
Few things we contemplate:
  • Use routing according to service so that we will probably benefit from caching better.
  • Change the indices according to service + month so that we will query much less data, but will add many indices (now instead of 12 indices a year we will have 300x12 and growing when the number of clients grow).
Any thoughts/suggestions?

Thanks,
Ron

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/096f4d7b-2702-46d7-96d6-f746f2389623%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/096f4d7b-2702-46d7-96d6-f746f2389623%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/096f4d7b-2702-46d7-96d6-f746f2389623%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/096f4d7b-2702-46d7-96d6-f746f2389623%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit <a href="https://groups.google.com/d/topic/elasticsearch/32uvCMR1kl4/unsubscribe" target="_blank" onmousedown="this.href='https://groups.google.com/d/topic/elasticsearch/32uvCMR1kl4/unsubscribe';return true;" onclick="this.href='https://groups.google.com/d/topic/elasticsearch/32uvCMR1kl4/unsubscribe';return true;">https://groups.google.com/d/topic/elasticsearch/32uvCMR1kl4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_TaWiZvpEpZ10_bMHhuOwSxpmBv%2BBhTNjPUHnKTA-Bgg%40mail.gmail.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_TaWiZvpEpZ10_bMHhuOwSxpmBv%2BBhTNjPUHnKTA-Bgg%40mail.gmail.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_TaWiZvpEpZ10_bMHhuOwSxpmBv%2BBhTNjPUHnKTA-Bgg%40mail.gmail.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_TaWiZvpEpZ10_bMHhuOwSxpmBv%2BBhTNjPUHnKTA-Bgg%40mail.gmail.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/CAKHuyJo5nyDgUfur-%3DNw6O4Jg98duO1GQVr5zRf_tcqqg-kJ-w%40mail.gmail.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/CAKHuyJo5nyDgUfur-%3DNw6O4Jg98duO1GQVr5zRf_tcqqg-kJ-w%40mail.gmail.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/CAKHuyJo5nyDgUfur-%3DNw6O4Jg98duO1GQVr5zRf_tcqqg-kJ-w%40mail.gmail.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/CAKHuyJo5nyDgUfur-%3DNw6O4Jg98duO1GQVr5zRf_tcqqg-kJ-w%40mail.gmail.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/814ae60c-9aac-40c8-bffc-6c869c7e375c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.