Throttling / Forcing Garbage Collection during Bulk Indexing

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Throttling / Forcing Garbage Collection during Bulk Indexing

davrob
Hi,

The use case I have is occasional bursts of millions of index updates, which uses up alot of JVM Heap space, but after these bursts, the heap usage goes back to a low level.

I find the quantity of updates can easily cause OOM exceptions, which crashes ElasticSearch.  

I am able to monitor the ElasticSearch Server node JVM Heap allocation from the client Indexing thread, and when the JVM Heap exceeds a certain amount, I have the opportunity to take some action which will tell ElasticSearch to reduce its heap allocation somehow.

I thought that calling flush() on the index which is being written to would do this, but actually it has no effect whatsoever.  Is there anything I can do to tell ES to reduce the Heap Size, and force Garbage Collection?  

I considered reducing this JVM param:  -XX:CMSInitiatingOccupancyFraction=75 - but to be honest - I don't see any attempt by ES JVM to garbage collect, even when it does reach 75% of the maximum allocated memory.

- David.

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Throttling / Forcing Garbage Collection during Bulk Indexing

Otis Gospodnetic
Hi,

You could call System.gc(), although that:
A) could be disabled via Java command line params, so you'd want to double-check that's not the case
B) is not something the JVM will necessarily go and do, it is just a hint to the JVM

There are other parameters like -XX:SurvivorRatio=n and -XX:NewRatio=n and -XX:MaxHeapFreeRatio=n that may help... See http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html

Otis
--
ELASTICSEARCH Performance Monitoring - http://sematext.com/spm/index.html



On Friday, January 11, 2013 5:14:54 AM UTC-5, davrob2 wrote:
Hi,

The use case I have is occasional bursts of millions of index updates, which uses up alot of JVM Heap space, but after these bursts, the heap usage goes back to a low level.

I find the quantity of updates can easily cause OOM exceptions, which crashes ElasticSearch.  

I am able to monitor the ElasticSearch Server node JVM Heap allocation from the client Indexing thread, and when the JVM Heap exceeds a certain amount, I have the opportunity to take some action which will tell ElasticSearch to reduce its heap allocation somehow.

I thought that calling flush() on the index which is being written to would do this, but actually it has no effect whatsoever.  Is there anything I can do to tell ES to reduce the Heap Size, and force Garbage Collection?  

I considered reducing this JVM param:  -XX:CMSInitiatingOccupancyFraction=75 - but to be honest - I don't see any attempt by ES JVM to garbage collect, even when it does reach 75% of the maximum allocated memory.

- David.

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Throttling / Forcing Garbage Collection during Bulk Indexing

joergprante@gmail.com
In reply to this post by davrob
Throttling bulk indexing just by trying to change the heap usage behaviour, or by forcing GC, is falling way too short.

If you use the Java API, look at https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/action/bulk/BulkProcessor.java
where bulk throttling is demonstrated.

By playing with the number of concurrentRequests or bulkSize, you can define an upper limit on the bulk throughput. As a result, the heap usage for bulk indexing is limited, too.

Calling flush() has no effect because it controls the data move from the translog buffer to the index. It has nothing to do with bulk indexing or with heap usage.

Best regards,

Jörg

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Throttling / Forcing Garbage Collection during Bulk Indexing

davrob
Hi Jorg,

That's exactly what I was looking for.  Thanks.

David.

On Sunday, January 13, 2013 12:04:57 AM UTC, Jörg Prante wrote:
Throttling bulk indexing just by trying to change the heap usage behaviour, or by forcing GC, is falling way too short.

If you use the Java API, look at https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/action/bulk/BulkProcessor.java
where bulk throttling is demonstrated.

By playing with the number of concurrentRequests or bulkSize, you can define an upper limit on the bulk throughput. As a result, the heap usage for bulk indexing is limited, too.

Calling flush() has no effect because it controls the data move from the translog buffer to the index. It has nothing to do with bulk indexing or with heap usage.

Best regards,

Jörg

--