Apparent memory leak after a few days of heavy indexing

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Apparent memory leak after a few days of heavy indexing

rafe
I have a two node cluster. Each node runs with a heap size 16GB. The nodes run on their own boxes and have plenty of CPU to themselves. Currently, the cluster's only workload is indexing at pretty high volume. We are indexing using the bulk index API, and are sending about 10 batches of 400 documents per second. We're using the Java client, specifically TransportClient.

Things work well for a little while (1-2 days), but eventually, the cluster falls over -- see the heap usage chart from Graphite. This is for just one host, but the memory behavior is identical across the two nodes.

Inline image 1

This looks like a memory leak to me. Logs don't reveal anything out of the ordinary happened when heap usage started increasing linearly. A few common sources of memory issues that I've already ruled out:
  • The cache. Since the workload is basically entirely indexing, the filter and field caches shouldn't even come into play here. Either way, both are configured to be limited in size, so this can't be it.
  • "Not enough heap" -- I can reproduce this no matter how much heap space I give my nodes.
Anyone seen this before, or any ideas as to what my issue might be here? I've turned on TRACE logging for Lucene merges, but I don't think I'll get any good data out of that until my cluster crashes again.

Rafe

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apparent memory leak after a few days of heavy indexing

joergprante@gmail.com
You don't tell us the ES version and the JVM heap memory you have
configured, so I assume standard settings? (Image is not viewable).

I recommend increasing the heap, and also the following settings for
smoother indexing (less peak resource intensive)

index.merge.policy.max_merged_segment: 2g
index.merge.policy.segments_per_tier: 24
index.merge.policy.max_merge_at_once: 8

And for better fail safety, consider a 3 node cluster.

Jörg

Am 18.07.13 01:57, schrieb [hidden email]:
> I have a two node cluster. Each node runs with a heap size 16GB. The
> nodes run on their own boxes and have plenty of CPU to themselves.
> Currently, the cluster's only workload is indexing at pretty high
> volume. We are indexing using the bulk index API, and are sending
> about 10 batches of 400 documents per second. We're using the Java
> client, specifically TransportClient.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apparent memory leak after a few days of heavy indexing

kimchy
Administrator
In reply to this post by rafe
Can you open an issue and share an example so we can recreate it and try and chase it up?


On Thu, Jul 18, 2013 at 1:57 AM, <[hidden email]> wrote:
I have a two node cluster. Each node runs with a heap size 16GB. The nodes run on their own boxes and have plenty of CPU to themselves. Currently, the cluster's only workload is indexing at pretty high volume. We are indexing using the bulk index API, and are sending about 10 batches of 400 documents per second. We're using the Java client, specifically TransportClient.

Things work well for a little while (1-2 days), but eventually, the cluster falls over -- see the heap usage chart from Graphite. This is for just one host, but the memory behavior is identical across the two nodes.

Inline image 1

This looks like a memory leak to me. Logs don't reveal anything out of the ordinary happened when heap usage started increasing linearly. A few common sources of memory issues that I've already ruled out:
  • The cache. Since the workload is basically entirely indexing, the filter and field caches shouldn't even come into play here. Either way, both are configured to be limited in size, so this can't be it.
  • "Not enough heap" -- I can reproduce this no matter how much heap space I give my nodes.
Anyone seen this before, or any ideas as to what my issue might be here? I've turned on TRACE logging for Lucene merges, but I don't think I'll get any good data out of that until my cluster crashes again.

Rafe

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apparent memory leak after a few days of heavy indexing

rafe
In reply to this post by joergprante@gmail.com
Sorry for the incomplete information. I am using ES 0.90.1 and I am using default JVM memory settings, with the exception of -Xmx16g and -Xms16g. The chart I tried to attach before is at http://i.imgur.com/vYeDEsc.png, hopefully the link I gave won't be stripped out as well.

Jörg, could merging cause the types of pauses and memory usage I am seeing? A few hours and like 6GB seems unreasonable for a merge, and also seems really unlikely based on my experience with Lucene.

Shay, I will work on submitting a ticket.

Rafe

On Thu, Jul 18, 2013 at 12:03 AM, Jörg Prante <[hidden email]> wrote:
You don't tell us the ES version and the JVM heap memory you have configured, so I assume standard settings? (Image is not viewable).

I recommend increasing the heap, and also the following settings for smoother indexing (less peak resource intensive)

index.merge.policy.max_merged_segment: 2g
index.merge.policy.segments_per_tier: 24
index.merge.policy.max_merge_at_once: 8

And for better fail safety, consider a 3 node cluster.

Jörg

Am 18.07.13 01:57, schrieb [hidden email]:

I have a two node cluster. Each node runs with a heap size 16GB. The nodes run on their own boxes and have plenty of CPU to themselves. Currently, the cluster's only workload is indexing at pretty high volume. We are indexing using the bulk index API, and are sending about 10 batches of 400 documents per second. We're using the Java client, specifically TransportClient.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/cS-XVa6jEmU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.



--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apparent memory leak after a few days of heavy indexing

vinh
Rafe, just curious…what size are your indexes?  Do you cap them and create a new index once the size is reached?  Or are you indexing into a single index?
-Vinh

On Jul 18, 2013, at 9:32 AM, Rafe Kettler <[hidden email]> wrote:

Sorry for the incomplete information. I am using ES 0.90.1 and I am using default JVM memory settings, with the exception of -Xmx16g and -Xms16g. The chart I tried to attach before is at http://i.imgur.com/vYeDEsc.png, hopefully the link I gave won't be stripped out as well.

Jörg, could merging cause the types of pauses and memory usage I am seeing? A few hours and like 6GB seems unreasonable for a merge, and also seems really unlikely based on my experience with Lucene.

Shay, I will work on submitting a ticket.

Rafe

On Thu, Jul 18, 2013 at 12:03 AM, Jörg Prante <[hidden email]> wrote:
You don't tell us the ES version and the JVM heap memory you have configured, so I assume standard settings? (Image is not viewable).

I recommend increasing the heap, and also the following settings for smoother indexing (less peak resource intensive)

index.merge.policy.max_merged_segment: 2g
index.merge.policy.segments_per_tier: 24
index.merge.policy.max_merge_at_once: 8

And for better fail safety, consider a 3 node cluster.

Jörg

Am 18.07.13 01:57, schrieb [hidden email]:

I have a two node cluster. Each node runs with a heap size 16GB. The nodes run on their own boxes and have plenty of CPU to themselves. Currently, the cluster's only workload is indexing at pretty high volume. We are indexing using the bulk index API, and are sending about 10 batches of 400 documents per second. We're using the Java client, specifically TransportClient.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/cS-XVa6jEmU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.




--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apparent memory leak after a few days of heavy indexing

rafe
Vinh, I am creating one index per day's worth of data (so there is no size threshold for creating a new index). My indices are ~500 million docs, 75GB on disk (for the primary shards).

Rafe


On Thu, Jul 18, 2013 at 9:48 AM, vinh <[hidden email]> wrote:
Rafe, just curious…what size are your indexes?  Do you cap them and create a new index once the size is reached?  Or are you indexing into a single index?
-Vinh

On Jul 18, 2013, at 9:32 AM, Rafe Kettler <[hidden email]> wrote:

Sorry for the incomplete information. I am using ES 0.90.1 and I am using default JVM memory settings, with the exception of -Xmx16g and -Xms16g. The chart I tried to attach before is at http://i.imgur.com/vYeDEsc.png, hopefully the link I gave won't be stripped out as well.

Jörg, could merging cause the types of pauses and memory usage I am seeing? A few hours and like 6GB seems unreasonable for a merge, and also seems really unlikely based on my experience with Lucene.

Shay, I will work on submitting a ticket.

Rafe

On Thu, Jul 18, 2013 at 12:03 AM, Jörg Prante <[hidden email]> wrote:
You don't tell us the ES version and the JVM heap memory you have configured, so I assume standard settings? (Image is not viewable).

I recommend increasing the heap, and also the following settings for smoother indexing (less peak resource intensive)

index.merge.policy.max_merged_segment: 2g
index.merge.policy.segments_per_tier: 24
index.merge.policy.max_merge_at_once: 8

And for better fail safety, consider a 3 node cluster.

Jörg

Am 18.07.13 01:57, schrieb [hidden email]:

I have a two node cluster. Each node runs with a heap size 16GB. The nodes run on their own boxes and have plenty of CPU to themselves. Currently, the cluster's only workload is indexing at pretty high volume. We are indexing using the bulk index API, and are sending about 10 batches of 400 documents per second. We're using the Java client, specifically TransportClient.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/cS-XVa6jEmU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.




--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].

For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/cS-XVa6jEmU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Loading...