cyclical ES CPU usage

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

cyclical ES CPU usage

Erik Anderson
Hello all -

I have a single-node ES server at the moment, accepting messages from a single logstash instance, which in turn fetches messages from a redis broker.

We recently had a networking glitch that caused messages to queue up for a few days. I have logs flowing again now and LS/ES are slowly catching up. However, during this time, I've observed a peculiar CPU usage pattern caused by elasticsearch. LS/ES have been ingesting messages as fast as possible during this "catch up" period. During this time, this is what the CPU usage of our LS/ES node looked like:

http://photos.smugmug.com/photos/i-PjvfrT5/0/O/i-PjvfrT5.png

That increased CPU usage is due to ES.

Upon initially seeing this increased CPU load, I expected to see corresponding iowait, but that is not the case.

What I've noticed is that near the peak of these spikes, message processing rates actually slow down to the point where it's actually slower than our systems are generating logs (which is quite slow at this point, something in the neighorhood of 5 messages/second.).

During times of very high message rates (such as when our system was catching up with a few day's queued logs), is there some sort of internal queueing mechanism that happens within Elasticsearch that may cause this?

Has anyone else seen CPU usage patterns like this?

Thank you!
-Erik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: cyclical ES CPU usage

Erik Anderson
Does anyone have a clue about this?

Since posting this message, I've upgraded from ES 0.90.1 to 0.90.5, and continue to see the same behavior.

Thank you!
-Erik


On Wed, Oct 2, 2013 at 2:16 PM, Erik Anderson <[hidden email]> wrote:
Hello all -

I have a single-node ES server at the moment, accepting messages from a single logstash instance, which in turn fetches messages from a redis broker.

We recently had a networking glitch that caused messages to queue up for a few days. I have logs flowing again now and LS/ES are slowly catching up. However, during this time, I've observed a peculiar CPU usage pattern caused by elasticsearch. LS/ES have been ingesting messages as fast as possible during this "catch up" period. During this time, this is what the CPU usage of our LS/ES node looked like:


That increased CPU usage is due to ES.

Upon initially seeing this increased CPU load, I expected to see corresponding iowait, but that is not the case.

What I've noticed is that near the peak of these spikes, message processing rates actually slow down to the point where it's actually slower than our systems are generating logs (which is quite slow at this point, something in the neighorhood of 5 messages/second.).

During times of very high message rates (such as when our system was catching up with a few day's queued logs), is there some sort of internal queueing mechanism that happens within Elasticsearch that may cause this?

Has anyone else seen CPU usage patterns like this?

Thank you!
-Erik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: cyclical ES CPU usage

Alexander Reelsen-2
Hey,

I would be interested to see the nodes stats during such a high CPU time in order to find out, where the CPU spends its time at. Also you can maybe catch the hot_threads output as well. Using the nodes stats you can actually get an indication, if the CPU time is spent for indexing, merging or other activities.


Hope this helps. Have you changed anything from the default configuration (throttling for example)?


--Alex


On Fri, Oct 4, 2013 at 10:21 PM, Erik Anderson <[hidden email]> wrote:
Does anyone have a clue about this?

Since posting this message, I've upgraded from ES 0.90.1 to 0.90.5, and continue to see the same behavior.

Thank you!
-Erik


On Wed, Oct 2, 2013 at 2:16 PM, Erik Anderson <[hidden email]> wrote:
Hello all -

I have a single-node ES server at the moment, accepting messages from a single logstash instance, which in turn fetches messages from a redis broker.

We recently had a networking glitch that caused messages to queue up for a few days. I have logs flowing again now and LS/ES are slowly catching up. However, during this time, I've observed a peculiar CPU usage pattern caused by elasticsearch. LS/ES have been ingesting messages as fast as possible during this "catch up" period. During this time, this is what the CPU usage of our LS/ES node looked like:


That increased CPU usage is due to ES.

Upon initially seeing this increased CPU load, I expected to see corresponding iowait, but that is not the case.

What I've noticed is that near the peak of these spikes, message processing rates actually slow down to the point where it's actually slower than our systems are generating logs (which is quite slow at this point, something in the neighorhood of 5 messages/second.).

During times of very high message rates (such as when our system was catching up with a few day's queued logs), is there some sort of internal queueing mechanism that happens within Elasticsearch that may cause this?

Has anyone else seen CPU usage patterns like this?

Thank you!
-Erik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: cyclical ES CPU usage

Otis Gospodnetic
In reply to this post by Erik Anderson
Hi Erik,

My first guess would be that this is caused by Lucene segment merging.

Otis
--
ELASTICSEARCH Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html


On Friday, October 4, 2013 4:21:34 PM UTC-4, Erik Anderson wrote:
Does anyone have a clue about this?

Since posting this message, I've upgraded from ES 0.90.1 to 0.90.5, and continue to see the same behavior.

Thank you!
-Erik


On Wed, Oct 2, 2013 at 2:16 PM, Erik Anderson <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="g8YKNXSiM8wJ">erik...@...> wrote:
Hello all -

I have a single-node ES server at the moment, accepting messages from a single logstash instance, which in turn fetches messages from a redis broker.

We recently had a networking glitch that caused messages to queue up for a few days. I have logs flowing again now and LS/ES are slowly catching up. However, during this time, I've observed a peculiar CPU usage pattern caused by elasticsearch. LS/ES have been ingesting messages as fast as possible during this "catch up" period. During this time, this is what the CPU usage of our LS/ES node looked like:


That increased CPU usage is due to ES.

Upon initially seeing this increased CPU load, I expected to see corresponding iowait, but that is not the case.

What I've noticed is that near the peak of these spikes, message processing rates actually slow down to the point where it's actually slower than our systems are generating logs (which is quite slow at this point, something in the neighorhood of 5 messages/second.).

During times of very high message rates (such as when our system was catching up with a few day's queued logs), is there some sort of internal queueing mechanism that happens within Elasticsearch that may cause this?

Has anyone else seen CPU usage patterns like this?

Thank you!
-Erik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="g8YKNXSiM8wJ">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: cyclical ES CPU usage

Erik Anderson
In reply to this post by Alexander Reelsen-2
Hi Alexander -

During times of high load, it looks like the largest slice of CPU is being dedicated to the Lucene Merge Thread.

Cluster node stats:

Hot Threads:

-Erik


On Sun, Oct 6, 2013 at 12:09 PM, Alexander Reelsen <[hidden email]> wrote:
Hey,

I would be interested to see the nodes stats during such a high CPU time in order to find out, where the CPU spends its time at. Also you can maybe catch the hot_threads output as well. Using the nodes stats you can actually get an indication, if the CPU time is spent for indexing, merging or other activities.


Hope this helps. Have you changed anything from the default configuration (throttling for example)?


--Alex


On Fri, Oct 4, 2013 at 10:21 PM, Erik Anderson <[hidden email]> wrote:
Does anyone have a clue about this?

Since posting this message, I've upgraded from ES 0.90.1 to 0.90.5, and continue to see the same behavior.

Thank you!
-Erik


On Wed, Oct 2, 2013 at 2:16 PM, Erik Anderson <[hidden email]> wrote:
Hello all -

I have a single-node ES server at the moment, accepting messages from a single logstash instance, which in turn fetches messages from a redis broker.

We recently had a networking glitch that caused messages to queue up for a few days. I have logs flowing again now and LS/ES are slowly catching up. However, during this time, I've observed a peculiar CPU usage pattern caused by elasticsearch. LS/ES have been ingesting messages as fast as possible during this "catch up" period. During this time, this is what the CPU usage of our LS/ES node looked like:


That increased CPU usage is due to ES.

Upon initially seeing this increased CPU load, I expected to see corresponding iowait, but that is not the case.

What I've noticed is that near the peak of these spikes, message processing rates actually slow down to the point where it's actually slower than our systems are generating logs (which is quite slow at this point, something in the neighorhood of 5 messages/second.).

During times of very high message rates (such as when our system was catching up with a few day's queued logs), is there some sort of internal queueing mechanism that happens within Elasticsearch that may cause this?

Has anyone else seen CPU usage patterns like this?

Thank you!
-Erik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: cyclical ES CPU usage

Erik Anderson
In reply to this post by Otis Gospodnetic
On Mon, Oct 14, 2013 at 11:50 PM, Otis Gospodnetic <[hidden email]> wrote:
My first guess would be that this is caused by Lucene segment merging.

Thanks, Otis - you were spot on with this. See my previous email to Alexsander from just a minute ago.

Now...to research what to do about it. :)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: cyclical ES CPU usage

shadyabhi
Hi Erik,

If you update your documents or add new documents in a index, merging
is bound to happen. There is nothing you can do about it, that's how
lucene does it. You could try changing the merge policy
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-merge.html
if you want to play around.

On Fri, Oct 18, 2013 at 4:17 AM, Erik Anderson <[hidden email]> wrote:

> On Mon, Oct 14, 2013 at 11:50 PM, Otis Gospodnetic
> <[hidden email]> wrote:
>>
>> My first guess would be that this is caused by Lucene segment merging.
>
>
> Thanks, Otis - you were spot on with this. See my previous email to
> Alexsander from just a minute ago.
>
> Now...to research what to do about it. :)
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [hidden email].
> For more options, visit https://groups.google.com/groups/opt_out.



--
Regards,
Abhijeet Rastogi (shadyabhi)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: cyclical ES CPU usage

Otis Gospodnetic
In reply to this post by Erik Anderson
Hi Erik,

You can throttle merges.
http://search-lucene.com/?q=throttle+merge&fc_project=ElasticSearch

Otis
--
ELASTICSEARCH Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html


On Thursday, October 17, 2013 6:47:27 PM UTC-4, Erik Anderson wrote:
On Mon, Oct 14, 2013 at 11:50 PM, Otis Gospodnetic <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="YUg4eJnFkeQJ">otis.gos...@...> wrote:
My first guess would be that this is caused by Lucene segment merging.

Thanks, Otis - you were spot on with this. See my previous email to Alexsander from just a minute ago.

Now...to research what to do about it. :)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: cyclical ES CPU usage

Erik Anderson
Just to follow up on this - it turns out that the poor performance I was seeing actually had nothing to do with ES merging. Rather, it was due to the fact that I set flush_size equal to 1 in the elasticsearch_http logstash output plugin. I changed that to 200, and my indexing rates went from 15-20/second to nearly 500/second.

-Erik


On Thu, Oct 17, 2013 at 11:44 PM, Otis Gospodnetic <[hidden email]> wrote:
Hi Erik,

You can throttle merges.

Otis
--
ELASTICSEARCH Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html


On Thursday, October 17, 2013 6:47:27 PM UTC-4, Erik Anderson wrote:
On Mon, Oct 14, 2013 at 11:50 PM, Otis Gospodnetic <[hidden email]> wrote:
My first guess would be that this is caused by Lucene segment merging.

Thanks, Otis - you were spot on with this. See my previous email to Alexsander from just a minute ago.

Now...to research what to do about it. :)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.