High CPU usage question

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

High CPU usage question

snmaynard
We've been using elasticsearch in production for a month or so, and I'm seeing strange CPU usage patterns, with really high CPU for periods, followed by really low CPU for a long time. These patterns cause timeouts so I'm trying to get to the bottom of them.

Firstly I'm running on an AWS small instance, so it could be just that the instance is too small - but I dont think so, as CPU usage can be close to zero for long periods.

I've run hotthreads a couple of times, and the output is available here and here

I've also included copies of munin graphs, so you can see what I mean by strange usage patterns. Theres no real increase in traffic that would explain this. Is this "hitting a cliff" of performance something you would expect to see on elasticsearch, or is something else happening here that would explain it? I am running 0.20.2 on production.


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: High CPU usage question

kimchy
Administrator
Seems like its busy doing refresh at the time in query (making recent changes available for search). It might also be that large merges are happening at the time (you can use the node stats API with the indices flag to check if there is an on going large merge going). If thats the case, you can use merge throttling in order to keep that at bay.

On Feb 28, 2013, at 5:18 AM, Simon <[hidden email]> wrote:

We've been using elasticsearch in production for a month or so, and I'm seeing strange CPU usage patterns, with really high CPU for periods, followed by really low CPU for a long time. These patterns cause timeouts so I'm trying to get to the bottom of them.

Firstly I'm running on an AWS small instance, so it could be just that the instance is too small - but I dont think so, as CPU usage can be close to zero for long periods.

I've run hotthreads a couple of times, and the output is available here and here

I've also included copies of munin graphs, so you can see what I mean by strange usage patterns. Theres no real increase in traffic that would explain this. Is this "hitting a cliff" of performance something you would expect to see on elasticsearch, or is something else happening here that would explain it? I am running 0.20.2 on production.





--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: High CPU usage question

snmaynard
Previously I was updating documents using an insert with a duplicate id, as my driver didn't support upsert and it seemed to work ok.

As a gut feeling I thought that might be causing this issue, and so I changed from inserting an object with a duplicate id, to using the upsert api.

Within seconds of deploying the code, the CPU usage dropped and load dropped from 3.5+ to around 0.1. I indexed a new document and everything is still working.

Does this sound right to you guys? Is there a bug here or was I just doing something you should never do?

On Wednesday, February 27, 2013 11:59:28 PM UTC-8, kimchy wrote:
Seems like its busy doing refresh at the time in query (making recent changes available for search). It might also be that large merges are happening at the time (you can use the node stats API with the indices flag to check if there is an on going large merge going). If thats the case, you can use merge throttling in order to keep that at bay.

On Feb 28, 2013, at 5:18 AM, Simon <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="BTp0BttNZyIJ">snma...@...> wrote:

We've been using elasticsearch in production for a month or so, and I'm seeing strange CPU usage patterns, with really high CPU for periods, followed by really low CPU for a long time. These patterns cause timeouts so I'm trying to get to the bottom of them.

Firstly I'm running on an AWS small instance, so it could be just that the instance is too small - but I dont think so, as CPU usage can be close to zero for long periods.

I've run hotthreads a couple of times, and the output is available here and here

I've also included copies of munin graphs, so you can see what I mean by strange usage patterns. Theres no real increase in traffic that would explain this. Is this "hitting a cliff" of performance something you would expect to see on elasticsearch, or is something else happening here that would explain it? I am running 0.20.2 on production.





--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="BTp0BttNZyIJ">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: High CPU usage question

kimchy
Administrator
That sounds strange…, effectively, update internally does the same thing as index with the same id…, is ES the only process running on the machine?

On Mar 2, 2013, at 4:06 AM, Simon <[hidden email]> wrote:

Previously I was updating documents using an insert with a duplicate id, as my driver didn't support upsert and it seemed to work ok.

As a gut feeling I thought that might be causing this issue, and so I changed from inserting an object with a duplicate id, to using the upsert api.

Within seconds of deploying the code, the CPU usage dropped and load dropped from 3.5+ to around 0.1. I indexed a new document and everything is still working.

Does this sound right to you guys? Is there a bug here or was I just doing something you should never do?

On Wednesday, February 27, 2013 11:59:28 PM UTC-8, kimchy wrote:
Seems like its busy doing refresh at the time in query (making recent changes available for search). It might also be that large merges are happening at the time (you can use the node stats API with the indices flag to check if there is an on going large merge going). If thats the case, you can use merge throttling in order to keep that at bay.

On Feb 28, 2013, at 5:18 AM, Simon <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="BTp0BttNZyIJ">snma...@...> wrote:

We've been using elasticsearch in production for a month or so, and I'm seeing strange CPU usage patterns, with really high CPU for periods, followed by really low CPU for a long time. These patterns cause timeouts so I'm trying to get to the bottom of them.

Firstly I'm running on an AWS small instance, so it could be just that the instance is too small - but I dont think so, as CPU usage can be close to zero for long periods.

I've run hotthreads a couple of times, and the output is available here and here

I've also included copies of munin graphs, so you can see what I mean by strange usage patterns. Theres no real increase in traffic that would explain this. Is this "hitting a cliff" of performance something you would expect to see on elasticsearch, or is something else happening here that would explain it? I am running 0.20.2 on production.





--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="BTp0BttNZyIJ">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: High CPU usage question

snmaynard
Yeah, it's the only thing on the machine. It could just be a coincidence that the change coincided with a drop in CPU usage, as I said it did jump around a fair bit, but load has been around 0.1 overnight, so it seems like it has had some kind of affect.

I might drop it back to the old method to see if there is a strong CPU usage correlation and report back to you.

On Saturday, March 2, 2013 at 6:14 AM, [hidden email] wrote:

That sounds strange…, effectively, update internally does the same thing as index with the same id…, is ES the only process running on the machine?

On Mar 2, 2013, at 4:06 AM, Simon <[hidden email]> wrote:

Previously I was updating documents using an insert with a duplicate id, as my driver didn't support upsert and it seemed to work ok.

As a gut feeling I thought that might be causing this issue, and so I changed from inserting an object with a duplicate id, to using the upsert api.

Within seconds of deploying the code, the CPU usage dropped and load dropped from 3.5+ to around 0.1. I indexed a new document and everything is still working.

Does this sound right to you guys? Is there a bug here or was I just doing something you should never do?

On Wednesday, February 27, 2013 11:59:28 PM UTC-8, kimchy wrote:
Seems like its busy doing refresh at the time in query (making recent changes available for search). It might also be that large merges are happening at the time (you can use the node stats API with the indices flag to check if there is an on going large merge going). If thats the case, you can use merge throttling in order to keep that at bay.

On Feb 28, 2013, at 5:18 AM, Simon <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="BTp0BttNZyIJ">snma...@...> wrote:

We've been using elasticsearch in production for a month or so, and I'm seeing strange CPU usage patterns, with really high CPU for periods, followed by really low CPU for a long time. These patterns cause timeouts so I'm trying to get to the bottom of them.

Firstly I'm running on an AWS small instance, so it could be just that the instance is too small - but I dont think so, as CPU usage can be close to zero for long periods.

I've run hotthreads a couple of times, and the output is available here and here

I've also included copies of munin graphs, so you can see what I mean by strange usage patterns. Theres no real increase in traffic that would explain this. Is this "hitting a cliff" of performance something you would expect to see on elasticsearch, or is something else happening here that would explain it? I am running 0.20.2 on production.





--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="BTp0BttNZyIJ">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/Xh3wyvk2iiU/unsubscribe?hl=en-US.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: High CPU usage question

acv2
This post has NOT been accepted by the mailing list yet.
I'm using elasticsearch 1.5

and it is working perfectly the most part of the time, but everyday at the same time it becomes crazy, CPU % goes to ~70% when the average is around 3-5% there are SUPER servers with 32GB reserved for lucene, swap it is lock and clearing the cache doesn't solve the problem (it doesn't take down the heap mem)

Settings:

3 servers (nodes) 32 cores and 128GB RAM each
2 buckets (indices) one with ~18 million documents (this one doesn't receive updates pretty often just indexing new docs) the other one have around 7-8 million documents but we are constantly bombarding it with updates search delete and indexing as well

The best distribution for our structure, was to have only 1 shard per node with not replicas, we can afford to have a % of the data off for few seconds, that will be back as soon as the server get online again, and this process is fast enough since it doesn't need to relocate anything. previously we used to have 3 shards with 1 replica, but the issue mentioned above occurs as well, so is easy to figure it out that the problem is not related with the distribution.

Things that I already tried,

Merging, i try to use the Optimize API trying to give less load to the schedule merge, but actually the merging process takes a lot of R/W of the disk but it doesn't affect substantially the mem or the CPU load.

Flushing, I tried to flush with long and shot intervals, and the results were the same nothing changed, since flushing affects directly the merging process and as mentioned above, merging process doesn't takes that much of the CPU or mem usage.

managing the cache, clearing it manually but it doesn't seems to take the cpu load to normal state not even for a moment.

Here is the most of the elasticsearch.yml configs

elasticsearch-yml.txt

here is the stats when the server is in a normal state:
node_stats_normal.txt


Node stats during the problem.
node_stats.txt

When the server is in a normal state





When the server is taking really big on the CPU










I will appreciate any help or discussion that can point me in the right direction to get rid of this behavior

thanks in advance..

Regards,

Daniel