How many indexes is too many indexes?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

How many indexes is too many indexes?

kmoore.cce
I understand this may depend on a lot of factors, but I am curious on what is an efficient number of indexes for a large data set.

I would like to break up indexes by user and by date (I think) mostly because it will make data management easier on my end.

I am wondering when Elasticsearch will have issues with the number of indexes. For example is 10 a good number? 100? 1000? 10000? etc.

I would like to break up the indexes as much as possible and make use of aliases for searching the data of interest, but I don't want to create so many indexes that it will have an adverse affect on performance.

I would appreciate any insight into what is recommended and what others have experienced.

Thanks in advance.

-Kevin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/71c384b4-0dc8-4c98-8ef2-5b00872754e7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: How many indexes is too many indexes?

Mark Walkom
This entirely depends on your data structure, volume and cluster sizing.
Hundreds works, thousands should be ok if you have a lot of nodes, tens of thousands is even more nodes.

Aliases will also affect your requirements.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [hidden email]
web: www.campaignmonitor.com

On 9 October 2014 00:19, kmoore.cce <[hidden email]> wrote:
I understand this may depend on a lot of factors, but I am curious on what is an efficient number of indexes for a large data set.

I would like to break up indexes by user and by date (I think) mostly because it will make data management easier on my end.

I am wondering when Elasticsearch will have issues with the number of indexes. For example is 10 a good number? 100? 1000? 10000? etc.

I would like to break up the indexes as much as possible and make use of aliases for searching the data of interest, but I don't want to create so many indexes that it will have an adverse affect on performance.

I would appreciate any insight into what is recommended and what others have experienced.

Thanks in advance.

-Kevin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/71c384b4-0dc8-4c98-8ef2-5b00872754e7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YFYFcFYUpMzVJd_PrHTeHpL2VE3VbNjnaLcQKzcVhyrA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: How many indexes is too many indexes?

Kang-min Liu
In reply to this post by kmoore.cce

> I am wondering when Elasticsearch will have issues with the number of
> indexes. For example is 10 a good number? 100? 1000? 10000? etc.
>
> I would like to break up the indexes as much as possible and make use of
> aliases for searching the data of interest, but I don't want to create so
> many indexes that it will have an adverse affect on performance.

FYR, one issue that I ran into with daily indices is the limit of file
descriptiors per node.

I wanted to maximize write performance so the number of shards was set
to be 24, matching the number of CPU cores. It does not seem to be that
much, but after a few weeks the number of FD reach 65k limit before the
disk space ran out, and this setting need to be changed.

At this point the cluster have 6 nodes, 4 new indices per day, the total
days kept is about 80. But we do not do cross-day query.

--
Cheers,
Kang-min Liu

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/m2fvex68vh.fsf%40gugod.org.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: How many indexes is too many indexes?

Mark Walkom
Did you get better writes?
What sort of storage are you on, did you measure before and after, are you reaching I/O limits?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [hidden email]
web: www.campaignmonitor.com

On 9 October 2014 17:33, Kang-min Liu <[hidden email]> wrote:

> I am wondering when Elasticsearch will have issues with the number of
> indexes. For example is 10 a good number? 100? 1000? 10000? etc.
>
> I would like to break up the indexes as much as possible and make use of
> aliases for searching the data of interest, but I don't want to create so
> many indexes that it will have an adverse affect on performance.

FYR, one issue that I ran into with daily indices is the limit of file
descriptiors per node.

I wanted to maximize write performance so the number of shards was set
to be 24, matching the number of CPU cores. It does not seem to be that
much, but after a few weeks the number of FD reach 65k limit before the
disk space ran out, and this setting need to be changed.

At this point the cluster have 6 nodes, 4 new indices per day, the total
days kept is about 80. But we do not do cross-day query.

--
Cheers,
Kang-min Liu

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/m2fvex68vh.fsf%40gugod.org.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624b7dPM-vgf0B%2B0%3D28oLZaGc5keFi%2BRseMmZGqyhf91DyA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: How many indexes is too many indexes?

Kang-min Liu

Mark Walkom writes:

> Did you get better writes?
> What sort of storage are you on, did you measure before and after, are you
> reaching I/O limits?

We pump realtime log data and only measure the the overall processing
throughput instead of low level IO throughput (we had the data, but we
did not correlate those data with setting change). The disk is just some
hard drive but not SSD or some hybrid disk. We did not reach disk or
network IO limit before and afterwards. FD limits was the only limit we
ran into.

--
Cheers,
Kang-min Liu

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/m2eguh5fj2.fsf%40gugod.org.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: How many indexes is too many indexes?

kmoore.cce
Thank you for the feedback guys, it is greatly appreciated.
I had not thought about file descriptors so that gives me another thing to think about.

Our daily volume will be pretty high across all of our users, I don't think we have a great estimate, but right now we are at about 50 million documents a day and ~30 users.
Our cluster is in EC2 so we can adjust size and nodes basically whenever we need to, so I don't think that is a huge issue assuming we get the index layout correct.
At present we are using one large index which is causing some performance issues as you would expect. We also did not get our sharding correct originally so now we have really large shards.

We have a requirement to keep 90 days of data per user. With an upward bound of users at (indeterminate though) 1000 users. So that would be 90,000 indexes if we did it by day.
I guess I am wondering if that is a crazy thing to attempt to do, or if it makes more sense to break it up weekly or monthly instead in order to keep the index count down.

Our documents are usually pretty small (or what I would consider small) at <= 1K, but we will receive them basically constantly.
So I guess I am looking for tips on how we can layout and breakup indexes to get the best performance benefit as we grow.

Again thank you for the feedback. And appreciate anymore in advance!

Thanks,

On Thursday, October 9, 2014 1:18:07 PM UTC-4, gugod wrote:

Mark Walkom writes:

> Did you get better writes?
> What sort of storage are you on, did you measure before and after, are you
> reaching I/O limits?

We pump realtime log data and only measure the the overall processing
throughput instead of low level IO throughput (we had the data, but we
did not correlate those data with setting change). The disk is just some
hard drive but not SSD or some hybrid disk. We did not reach disk or
network IO limit before and afterwards. FD limits was the only limit we
ran into.

--
Cheers,
Kang-min Liu

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6a0f4b11-2061-4972-b475-1f04aaf66bbc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.