Flush API and Garbage Collection

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Flush API and Garbage Collection

vaidik
I am trying to understand Flush API and some JVM related issues I am seeing.

The Flush API in the guide says:
The flush API allows to flush one or more indices through an API. The flush process of an index basically frees memory from the index by flushing data to the index storage and clearing the internal transaction log. By default, ElasticSearch uses memory heuristics in order to automatically trigger flush operations as required in order to clear memory.

Does this mean that if I make a Flush request manually, everything transaction log related is ready for garbage collection?

Vaidik Kapoor
vaidikkapoor.info

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACWtv5kB6aE0OzHFjGeOm5CfknKTRhtnyfQvfL0bKTFECZZhkA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Flush API and Garbage Collection

Mark Walkom
It means it cleans out the TL for that specified index and gives it up for GC.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [hidden email]
web: www.campaignmonitor.com


On 3 December 2013 23:38, Vaidik Kapoor <[hidden email]> wrote:
I am trying to understand Flush API and some JVM related issues I am seeing.

The Flush API in the guide says:
The flush API allows to flush one or more indices through an API. The flush process of an index basically frees memory from the index by flushing data to the index storage and clearing the internal transaction log. By default, ElasticSearch uses memory heuristics in order to automatically trigger flush operations as required in order to clear memory.

Does this mean that if I make a Flush request manually, everything transaction log related is ready for garbage collection?

Vaidik Kapoor
vaidikkapoor.info

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACWtv5kB6aE0OzHFjGeOm5CfknKTRhtnyfQvfL0bKTFECZZhkA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YEwLj9gbHr%3DC2704rnnsfuSsjSdnqGu0ez9837xo_BFg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Flush API and Garbage Collection

vaidik
Okay thanks for clearing that out, Mark.

So here is what I am noticing and it is giving me a lot of trouble. I have two nodes with 32 GB RAM, out of which half is allocated to ES. I am just building out something so I wanted to see how fast can I write to ES, so at the moment I am just indexing in ES and not querying at all. After a couple of hours, I saw the heap usage to be about 97% and the GC was taking really long to run (in the order of many seconds) and was running frequently without really freeing much memory out of the heap for reuse. Then I stopped indexing and was doing nothing. Then I manually the flushed the index and waited for GC to free up some memory. Sadly, that's not what I observed.

Since I am new to ES, and after having read whatever I could so far, I am not able to understand what else might ES be using the heap for, especially when I am not indexing anything, not using the nodes for querying as well and have manually flushed the index using the Flush API. What else might be causing such high usage of heap memory?

This concerns me because the write speed drastically drops in such situations.

Any help would be appreciated.

Vaidik Kapoor
vaidikkapoor.info


On 3 December 2013 23:21, Mark Walkom <[hidden email]> wrote:
It means it cleans out the TL for that specified index and gives it up for GC.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [hidden email]
web: www.campaignmonitor.com


On 3 December 2013 23:38, Vaidik Kapoor <[hidden email]> wrote:
I am trying to understand Flush API and some JVM related issues I am seeing.

The Flush API in the guide says:
The flush API allows to flush one or more indices through an API. The flush process of an index basically frees memory from the index by flushing data to the index storage and clearing the internal transaction log. By default, ElasticSearch uses memory heuristics in order to automatically trigger flush operations as required in order to clear memory.

Does this mean that if I make a Flush request manually, everything transaction log related is ready for garbage collection?

Vaidik Kapoor
vaidikkapoor.info

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACWtv5kB6aE0OzHFjGeOm5CfknKTRhtnyfQvfL0bKTFECZZhkA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YEwLj9gbHr%3DC2704rnnsfuSsjSdnqGu0ez9837xo_BFg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACWtv5nv7%2BuXHmQkDe73xN16ci%3D-hRz99NV8pJBWgmhUugwS%2BA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Flush API and Garbage Collection

Mark Walkom
The smallest part of a shard is a segment and lucene caches data at that level, which is likely to be what you are seeing residing in your heap. ES does aggressively cache data so that queries are as fast as possible.

(This is obviously dependent on your data set size.)

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [hidden email]
web: www.campaignmonitor.com


On 4 December 2013 06:33, Vaidik Kapoor <[hidden email]> wrote:
Okay thanks for clearing that out, Mark.

So here is what I am noticing and it is giving me a lot of trouble. I have two nodes with 32 GB RAM, out of which half is allocated to ES. I am just building out something so I wanted to see how fast can I write to ES, so at the moment I am just indexing in ES and not querying at all. After a couple of hours, I saw the heap usage to be about 97% and the GC was taking really long to run (in the order of many seconds) and was running frequently without really freeing much memory out of the heap for reuse. Then I stopped indexing and was doing nothing. Then I manually the flushed the index and waited for GC to free up some memory. Sadly, that's not what I observed.

Since I am new to ES, and after having read whatever I could so far, I am not able to understand what else might ES be using the heap for, especially when I am not indexing anything, not using the nodes for querying as well and have manually flushed the index using the Flush API. What else might be causing such high usage of heap memory?

This concerns me because the write speed drastically drops in such situations.

Any help would be appreciated.

Vaidik Kapoor
vaidikkapoor.info


On 3 December 2013 23:21, Mark Walkom <[hidden email]> wrote:
It means it cleans out the TL for that specified index and gives it up for GC.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [hidden email]
web: www.campaignmonitor.com


On 3 December 2013 23:38, Vaidik Kapoor <[hidden email]> wrote:
I am trying to understand Flush API and some JVM related issues I am seeing.

The Flush API in the guide says:
The flush API allows to flush one or more indices through an API. The flush process of an index basically frees memory from the index by flushing data to the index storage and clearing the internal transaction log. By default, ElasticSearch uses memory heuristics in order to automatically trigger flush operations as required in order to clear memory.

Does this mean that if I make a Flush request manually, everything transaction log related is ready for garbage collection?

Vaidik Kapoor
vaidikkapoor.info

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACWtv5kB6aE0OzHFjGeOm5CfknKTRhtnyfQvfL0bKTFECZZhkA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YEwLj9gbHr%3DC2704rnnsfuSsjSdnqGu0ez9837xo_BFg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACWtv5nv7%2BuXHmQkDe73xN16ci%3D-hRz99NV8pJBWgmhUugwS%2BA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624a8mAehVpKCUfvBMu3oWFctM%2BjBvEiaJq0Uks12fws6eg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Flush API and Garbage Collection

joergprante@gmail.com
In reply to this post by vaidik
You do not mention the ES version, also not the heap size you use, and the data volume you handle when indexing. So it is not easy to help.

Anyway, from the situation you describe, what you observe has not much to do with flush or translog. Most probably it is the segment merging. After a few hours of constant indexing your segments grow larger and larger, and the re-loading of segments allocates the heap. 

Note, the default segment maximum merge setting is 5G. It means, segments may grow up to this size and loaded into the heap for merging. In bad cases, it may take a long time, long enough for nodes to disconnect from the cluster, not being able to report to other nodes to the heartbeat signal.

You should try streamlining your indexing by choosing smaller maximum segment sizes. Example:

index.merge.policy.max_merged_segment: 2g
index.merge.policy.segments_per_tier: 24
index.merge.policy.max_merge_at_once: 8

You can also try experimenting with the number of shards per node. The more shards, the longer it takes before segments get big. But, more shards also mean more resource consumption per node.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGbkVDuoFkhdFA1mJsDOpX7bu-ru-v4HmphVDdbhTio4Q%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Flush API and Garbage Collection

Alexander Reelsen-2
Hey,

before tuning, knowing what is in the heap would be handy along with its size. You can use the monitoring APIs to gather more information while the heap is filling during indexing... Also there might be log entries about slow garbage collections.



--Alex


On Wed, Dec 4, 2013 at 2:33 AM, [hidden email] <[hidden email]> wrote:
You do not mention the ES version, also not the heap size you use, and the data volume you handle when indexing. So it is not easy to help.

Anyway, from the situation you describe, what you observe has not much to do with flush or translog. Most probably it is the segment merging. After a few hours of constant indexing your segments grow larger and larger, and the re-loading of segments allocates the heap. 

Note, the default segment maximum merge setting is 5G. It means, segments may grow up to this size and loaded into the heap for merging. In bad cases, it may take a long time, long enough for nodes to disconnect from the cluster, not being able to report to other nodes to the heartbeat signal.

You should try streamlining your indexing by choosing smaller maximum segment sizes. Example:

index.merge.policy.max_merged_segment: 2g
index.merge.policy.segments_per_tier: 24
index.merge.policy.max_merge_at_once: 8

You can also try experimenting with the number of shards per node. The more shards, the longer it takes before segments get big. But, more shards also mean more resource consumption per node.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGbkVDuoFkhdFA1mJsDOpX7bu-ru-v4HmphVDdbhTio4Q%40mail.gmail.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8Ehwo_70YLyHAfsakrO8kqDNrNWBwDKd5PXxfPRbHhcw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Flush API and Garbage Collection

Jason Wee
In reply to this post by joergprante@gmail.com
Hi Jörg,

=> You can also try experimenting with the number of shards per node. The more shards, the longer it takes before segments get big. But, more shards also mean more resource consumption per node.

What are the resource consumption on per nodes? Any good indicator (e.g. api or monitoring tools)?

/Jason


On Wed, Dec 4, 2013 at 9:33 AM, [hidden email] <[hidden email]> wrote:
You do not mention the ES version, also not the heap size you use, and the data volume you handle when indexing. So it is not easy to help.

Anyway, from the situation you describe, what you observe has not much to do with flush or translog. Most probably it is the segment merging. After a few hours of constant indexing your segments grow larger and larger, and the re-loading of segments allocates the heap. 

Note, the default segment maximum merge setting is 5G. It means, segments may grow up to this size and loaded into the heap for merging. In bad cases, it may take a long time, long enough for nodes to disconnect from the cluster, not being able to report to other nodes to the heartbeat signal.

You should try streamlining your indexing by choosing smaller maximum segment sizes. Example:

index.merge.policy.max_merged_segment: 2g
index.merge.policy.segments_per_tier: 24
index.merge.policy.max_merge_at_once: 8

You can also try experimenting with the number of shards per node. The more shards, the longer it takes before segments get big. But, more shards also mean more resource consumption per node.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGbkVDuoFkhdFA1mJsDOpX7bu-ru-v4HmphVDdbhTio4Q%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHO4itx7rqE2E986pJRZPSxG29Zw9V1ZBA%2B8b6AOLbAiu6kh4w%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Flush API and Garbage Collection

Mark Walkom
For monitoring you can use ElasticHQ, kopf or bigdesk.
These all take API output and turn it into something a little more digestible.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [hidden email]
web: www.campaignmonitor.com


On 4 December 2013 20:33, Jason Wee <[hidden email]> wrote:
Hi Jörg,

=> You can also try experimenting with the number of shards per node. The more shards, the longer it takes before segments get big. But, more shards also mean more resource consumption per node.

What are the resource consumption on per nodes? Any good indicator (e.g. api or monitoring tools)?

/Jason


On Wed, Dec 4, 2013 at 9:33 AM, [hidden email] <[hidden email]> wrote:
You do not mention the ES version, also not the heap size you use, and the data volume you handle when indexing. So it is not easy to help.

Anyway, from the situation you describe, what you observe has not much to do with flush or translog. Most probably it is the segment merging. After a few hours of constant indexing your segments grow larger and larger, and the re-loading of segments allocates the heap. 

Note, the default segment maximum merge setting is 5G. It means, segments may grow up to this size and loaded into the heap for merging. In bad cases, it may take a long time, long enough for nodes to disconnect from the cluster, not being able to report to other nodes to the heartbeat signal.

You should try streamlining your indexing by choosing smaller maximum segment sizes. Example:

index.merge.policy.max_merged_segment: 2g
index.merge.policy.segments_per_tier: 24
index.merge.policy.max_merge_at_once: 8

You can also try experimenting with the number of shards per node. The more shards, the longer it takes before segments get big. But, more shards also mean more resource consumption per node.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGbkVDuoFkhdFA1mJsDOpX7bu-ru-v4HmphVDdbhTio4Q%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHO4itx7rqE2E986pJRZPSxG29Zw9V1ZBA%2B8b6AOLbAiu6kh4w%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624afYaG3hEGJz4Vrha%3D8FQjVkt0qBMALsJyfmjYa%2B6aeeg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.