Heap / GC Issues

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Heap / GC Issues

Ned Campion
Hey All,

I've got a cluster with 5 data nodes (2 master nodes). The cluster has ~100 indices, w/ doc counts in the 1k - 50k range. There is a low/medium amount of index load going into the cluster via the bulk api and a large amount of search traffic going in in the 40K queries per second range.

I'm running these data nodes on ec2 (c3.8xl's) with a 30GB heap, though at the time of the following sample I was testing out running with a 20GB heap. The process runs well for a while, a couple hours to a day or two depending on traffic, and then it get's into a bad state where there is continual doing long gc runs, ie every minute doing a stop the world run for 30-45sec, and seemingly getting very little out of it (ie starting with 18.8GB heap usage and going to 18.3GB heap usage).

Here the red line is a data node that is exhibiting the behavior. This is a graph of the "old" generation growing to nearly the complete heap size and then staying there for hours. During this time the application is severely degraded.


Example of one of the gc runs during this time (again they run every minute or so).

[2014-07-18 00:24:24,735][WARN ][monitor.jvm ] [prod-targeting-es2] [gc][old][10799][27] duration [41.5s], collections [1]/[42.5s], total [41.5s]/[2.2m], memory [18.8gb]->[18.3gb]/[19.8gb], all_pools {[young] [733.2mb]->[249.9mb]/[1.4gb]}{[survivor] [86mb]->[0b]/[191.3mb]}{[old] [18gb]->[18.1gb]/[18.1gb]}

We are running es 1.2.2 . We had been running Oracle 7u25 and we've tried upgrading to 7u65 with no effect. I just did a heap dump analysis using jmap and Eclipse Memory Analyzer and found that 85% of the heap was taken up with filter cache



We are doing a lot of "bool" conditions in our queries, so that may be a factor in the hefty filter cache.

Any ideas out there? Right now I have to bounce my data nodes every hour or two to ensure I don't reach this degraded state.


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/288da6e7-b85a-4cbf-a83d-d777ee7c9c57%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Heap / GC Issues

Mark Walkom
How many indexes and how much data do you have?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [hidden email]
web: www.campaignmonitor.com


On 19 July 2014 01:09, Ned Campion <[hidden email]> wrote:
Hey All,

I've got a cluster with 5 data nodes (2 master nodes). The cluster has ~100 indices, w/ doc counts in the 1k - 50k range. There is a low/medium amount of index load going into the cluster via the bulk api and a large amount of search traffic going in in the 40K queries per second range.

I'm running these data nodes on ec2 (c3.8xl's) with a 30GB heap, though at the time of the following sample I was testing out running with a 20GB heap. The process runs well for a while, a couple hours to a day or two depending on traffic, and then it get's into a bad state where there is continual doing long gc runs, ie every minute doing a stop the world run for 30-45sec, and seemingly getting very little out of it (ie starting with 18.8GB heap usage and going to 18.3GB heap usage).

Here the red line is a data node that is exhibiting the behavior. This is a graph of the "old" generation growing to nearly the complete heap size and then staying there for hours. During this time the application is severely degraded.


Example of one of the gc runs during this time (again they run every minute or so).

[2014-07-18 00:24:24,735][WARN ][monitor.jvm ] [prod-targeting-es2] [gc][old][10799][27] duration [41.5s], collections [1]/[42.5s], total [41.5s]/[2.2m], memory [18.8gb]->[18.3gb]/[19.8gb], all_pools {[young] [733.2mb]->[249.9mb]/[1.4gb]}{[survivor] [86mb]->[0b]/[191.3mb]}{[old] [18gb]->[18.1gb]/[18.1gb]}

We are running es 1.2.2 . We had been running Oracle 7u25 and we've tried upgrading to 7u65 with no effect. I just did a heap dump analysis using jmap and Eclipse Memory Analyzer and found that 85% of the heap was taken up with filter cache



We are doing a lot of "bool" conditions in our queries, so that may be a factor in the hefty filter cache.

Any ideas out there? Right now I have to bounce my data nodes every hour or two to ensure I don't reach this degraded state.


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/288da6e7-b85a-4cbf-a83d-d777ee7c9c57%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ajsDAZMbKxR1%2Bck149OjJrgfhwDqsATbT9PFpZCkbHPA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Heap / GC Issues

Ned Campion
Hey Mark,

The index is currently 16GB with 10 indices (added and removed daily, updated at a fairly low but constant pace) and just 448k documents between them (w/ geoshape + terms indexes)

Thanks for any and all help! and further info if desired

Best,
Ned


On Fri, Jul 18, 2014 at 8:36 PM, Mark Walkom <[hidden email]> wrote:
How many indexes and how much data do you have?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [hidden email]
web: www.campaignmonitor.com


On 19 July 2014 01:09, Ned Campion <[hidden email]> wrote:
Hey All,

I've got a cluster with 5 data nodes (2 master nodes). The cluster has ~100 indices, w/ doc counts in the 1k - 50k range. There is a low/medium amount of index load going into the cluster via the bulk api and a large amount of search traffic going in in the 40K queries per second range.

I'm running these data nodes on ec2 (c3.8xl's) with a 30GB heap, though at the time of the following sample I was testing out running with a 20GB heap. The process runs well for a while, a couple hours to a day or two depending on traffic, and then it get's into a bad state where there is continual doing long gc runs, ie every minute doing a stop the world run for 30-45sec, and seemingly getting very little out of it (ie starting with 18.8GB heap usage and going to 18.3GB heap usage).

Here the red line is a data node that is exhibiting the behavior. This is a graph of the "old" generation growing to nearly the complete heap size and then staying there for hours. During this time the application is severely degraded.


Example of one of the gc runs during this time (again they run every minute or so).

[2014-07-18 00:24:24,735][WARN ][monitor.jvm ] [prod-targeting-es2] [gc][old][10799][27] duration [41.5s], collections [1]/[42.5s], total [41.5s]/[2.2m], memory [18.8gb]->[18.3gb]/[19.8gb], all_pools {[young] [733.2mb]->[249.9mb]/[1.4gb]}{[survivor] [86mb]->[0b]/[191.3mb]}{[old] [18gb]->[18.1gb]/[18.1gb]}

We are running es 1.2.2 . We had been running Oracle 7u25 and we've tried upgrading to 7u65 with no effect. I just did a heap dump analysis using jmap and Eclipse Memory Analyzer and found that 85% of the heap was taken up with filter cache



We are doing a lot of "bool" conditions in our queries, so that may be a factor in the hefty filter cache.

Any ideas out there? Right now I have to bounce my data nodes every hour or two to ensure I don't reach this degraded state.


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/BS4H3qoFm8k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ajsDAZMbKxR1%2Bck149OjJrgfhwDqsATbT9PFpZCkbHPA%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMjDTzGo3k1cfiZkHnM-WGD-6spOGMSigEyhbPiK3u%2BCeoLq7A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Heap / GC Issues

Mark Walkom
How much in total, in the entire cluster.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [hidden email]
web: www.campaignmonitor.com


On 19 July 2014 10:55, Ned Campion <[hidden email]> wrote:
Hey Mark,

The index is currently 16GB with 10 indices (added and removed daily, updated at a fairly low but constant pace) and just 448k documents between them (w/ geoshape + terms indexes)

Thanks for any and all help! and further info if desired

Best,
Ned


On Fri, Jul 18, 2014 at 8:36 PM, Mark Walkom <[hidden email]> wrote:
How many indexes and how much data do you have?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [hidden email]
web: www.campaignmonitor.com


On 19 July 2014 01:09, Ned Campion <[hidden email]> wrote:
Hey All,

I've got a cluster with 5 data nodes (2 master nodes). The cluster has ~100 indices, w/ doc counts in the 1k - 50k range. There is a low/medium amount of index load going into the cluster via the bulk api and a large amount of search traffic going in in the 40K queries per second range.

I'm running these data nodes on ec2 (c3.8xl's) with a 30GB heap, though at the time of the following sample I was testing out running with a 20GB heap. The process runs well for a while, a couple hours to a day or two depending on traffic, and then it get's into a bad state where there is continual doing long gc runs, ie every minute doing a stop the world run for 30-45sec, and seemingly getting very little out of it (ie starting with 18.8GB heap usage and going to 18.3GB heap usage).

Here the red line is a data node that is exhibiting the behavior. This is a graph of the "old" generation growing to nearly the complete heap size and then staying there for hours. During this time the application is severely degraded.


Example of one of the gc runs during this time (again they run every minute or so).

[2014-07-18 00:24:24,735][WARN ][monitor.jvm ] [prod-targeting-es2] [gc][old][10799][27] duration [41.5s], collections [1]/[42.5s], total [41.5s]/[2.2m], memory [18.8gb]->[18.3gb]/[19.8gb], all_pools {[young] [733.2mb]->[249.9mb]/[1.4gb]}{[survivor] [86mb]->[0b]/[191.3mb]}{[old] [18gb]->[18.1gb]/[18.1gb]}

We are running es 1.2.2 . We had been running Oracle 7u25 and we've tried upgrading to 7u65 with no effect. I just did a heap dump analysis using jmap and Eclipse Memory Analyzer and found that 85% of the heap was taken up with filter cache



We are doing a lot of "bool" conditions in our queries, so that may be a factor in the hefty filter cache.

Any ideas out there? Right now I have to bounce my data nodes every hour or two to ensure I don't reach this degraded state.


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/BS4H3qoFm8k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ajsDAZMbKxR1%2Bck149OjJrgfhwDqsATbT9PFpZCkbHPA%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMjDTzGo3k1cfiZkHnM-WGD-6spOGMSigEyhbPiK3u%2BCeoLq7A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YtZxNCsxu_p%3DU4Rbzi54uTrHTtSwOMsjX%2BKQBZ2kBzew%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Heap / GC Issues

Ned Campion
5 data nodes as described, and 2 master nodes, 50 clients connected directly, that's it for docements 448k, hope that's what you mean 

Thank you for the help 

On Friday, July 18, 2014, Mark Walkom <[hidden email]> wrote:
How much in total, in the entire cluster.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: <a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;markw@campaignmonitor.com&#39;);" target="_blank">markw@...
web: www.campaignmonitor.com


On 19 July 2014 10:55, Ned Campion <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;nedcampion@gmail.com&#39;);" target="_blank">nedcampion@...> wrote:
Hey Mark,

The index is currently 16GB with 10 indices (added and removed daily, updated at a fairly low but constant pace) and just 448k documents between them (w/ geoshape + terms indexes)

Thanks for any and all help! and further info if desired

Best,
Ned


On Fri, Jul 18, 2014 at 8:36 PM, Mark Walkom <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;markw@campaignmonitor.com&#39;);" target="_blank">markw@...> wrote:
How many indexes and how much data do you have?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: <a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;markw@campaignmonitor.com&#39;);" target="_blank">markw@...
web: www.campaignmonitor.com


On 19 July 2014 01:09, Ned Campion <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;nedcampion@gmail.com&#39;);" target="_blank">nedcampion@...> wrote:
Hey All,

I've got a cluster with 5 data nodes (2 master nodes). The cluster has ~100 indices, w/ doc counts in the 1k - 50k range. There is a low/medium amount of index load going into the cluster via the bulk api and a large amount of search traffic going in in the 40K queries per second range.

I'm running these data nodes on ec2 (c3.8xl's) with a 30GB heap, though at the time of the following sample I was testing out running with a 20GB heap. The process runs well for a while, a couple hours to a day or two depending on traffic, and then it get's into a bad state where there is continual doing long gc runs, ie every minute doing a stop the world run for 30-45sec, and seemingly getting very little out of it (ie starting with 18.8GB heap usage and going to 18.3GB heap usage).

Here the red line is a data node that is exhibiting the behavior. This is a graph of the "old" generation growing to nearly the complete heap size and then staying there for hours. During this time the application is severely degraded.


Example of one of the gc runs during this time (again they run every minute or so).

[2014-07-18 00:24:24,735][WARN ][monitor.jvm ] [prod-targeting-es2] [gc][old][10799][27] duration [41.5s], collections [1]/[42.5s], total [41.5s]/[2.2m], memory [18.8gb]->[18.3gb]/[19.8gb], all_pools {[young] [733.2mb]->[249.9mb]/[1.4gb]}{[survivor] [86mb]->[0b]/[191.3mb]}{[old] [18gb]->[18.1gb]/[18.1gb]}

We are running es 1.2.2 . We had been running Oracle 7u25 and we've tried upgrading to 7u65 with no effect. I just did a heap dump analysis using jmap and Eclipse Memory Analyzer and found that 85% of the heap was taken up with filter cache



We are doing a lot of "bool" conditions in our queries, so that may be a factor in the hefty filter cache.

Any ideas out there? Right now I have to bounce my data nodes every hour or two to ensure I don't reach this degraded state.


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;elasticsearch%2Bunsubscribe@googlegroups.com&#39;);" target="_blank">elasticsearch+unsubscribe@....

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/BS4H3qoFm8k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to <a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;elasticsearch%2Bunsubscribe@googlegroups.com&#39;);" target="_blank">elasticsearch+unsubscribe@....
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ajsDAZMbKxR1%2Bck149OjJrgfhwDqsATbT9PFpZCkbHPA%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;elasticsearch%2Bunsubscribe@googlegroups.com&#39;);" target="_blank">elasticsearch+unsubscribe@....
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMjDTzGo3k1cfiZkHnM-WGD-6spOGMSigEyhbPiK3u%2BCeoLq7A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/BS4H3qoFm8k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to <a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;elasticsearch%2Bunsubscribe@googlegroups.com&#39;);" target="_blank">elasticsearch+unsubscribe@....
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YtZxNCsxu_p%3DU4Rbzi54uTrHTtSwOMsjX%2BKQBZ2kBzew%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMjDTzGzD7yVeefweYwq7YWQsmEvghgmDLDt%3DKq0YCNmppqkQw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Heap / GC Issues

Mark Walkom
You said you have one index of 16GB but that you have ~100 indexes in total. How much data across all those indexes, the cluster total in GB/TB.


Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [hidden email]
web: www.campaignmonitor.com


On 19 July 2014 11:25, Ned Campion <[hidden email]> wrote:
5 data nodes as described, and 2 master nodes, 50 clients connected directly, that's it for docements 448k, hope that's what you mean 

Thank you for the help 

On Friday, July 18, 2014, Mark Walkom <[hidden email]> wrote:
How much in total, in the entire cluster.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [hidden email]
web: www.campaignmonitor.com


On 19 July 2014 10:55, Ned Campion <[hidden email]> wrote:
Hey Mark,

The index is currently 16GB with 10 indices (added and removed daily, updated at a fairly low but constant pace) and just 448k documents between them (w/ geoshape + terms indexes)

Thanks for any and all help! and further info if desired

Best,
Ned


On Fri, Jul 18, 2014 at 8:36 PM, Mark Walkom <[hidden email]> wrote:
How many indexes and how much data do you have?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [hidden email]
web: www.campaignmonitor.com


On 19 July 2014 01:09, Ned Campion <[hidden email]> wrote:
Hey All,

I've got a cluster with 5 data nodes (2 master nodes). The cluster has ~100 indices, w/ doc counts in the 1k - 50k range. There is a low/medium amount of index load going into the cluster via the bulk api and a large amount of search traffic going in in the 40K queries per second range.

I'm running these data nodes on ec2 (c3.8xl's) with a 30GB heap, though at the time of the following sample I was testing out running with a 20GB heap. The process runs well for a while, a couple hours to a day or two depending on traffic, and then it get's into a bad state where there is continual doing long gc runs, ie every minute doing a stop the world run for 30-45sec, and seemingly getting very little out of it (ie starting with 18.8GB heap usage and going to 18.3GB heap usage).

Here the red line is a data node that is exhibiting the behavior. This is a graph of the "old" generation growing to nearly the complete heap size and then staying there for hours. During this time the application is severely degraded.


Example of one of the gc runs during this time (again they run every minute or so).

[2014-07-18 00:24:24,735][WARN ][monitor.jvm ] [prod-targeting-es2] [gc][old][10799][27] duration [41.5s], collections [1]/[42.5s], total [41.5s]/[2.2m], memory [18.8gb]->[18.3gb]/[19.8gb], all_pools {[young] [733.2mb]->[249.9mb]/[1.4gb]}{[survivor] [86mb]->[0b]/[191.3mb]}{[old] [18gb]->[18.1gb]/[18.1gb]}

We are running es 1.2.2 . We had been running Oracle 7u25 and we've tried upgrading to 7u65 with no effect. I just did a heap dump analysis using jmap and Eclipse Memory Analyzer and found that 85% of the heap was taken up with filter cache



We are doing a lot of "bool" conditions in our queries, so that may be a factor in the hefty filter cache.

Any ideas out there? Right now I have to bounce my data nodes every hour or two to ensure I don't reach this degraded state.


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/BS4H3qoFm8k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ajsDAZMbKxR1%2Bck149OjJrgfhwDqsATbT9PFpZCkbHPA%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMjDTzGo3k1cfiZkHnM-WGD-6spOGMSigEyhbPiK3u%2BCeoLq7A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/BS4H3qoFm8k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YtZxNCsxu_p%3DU4Rbzi54uTrHTtSwOMsjX%2BKQBZ2kBzew%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMjDTzGzD7yVeefweYwq7YWQsmEvghgmDLDt%3DKq0YCNmppqkQw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624Z2KPqft0evkzQOTHT-uD3V2v8fXCX%3DHiNr7o0OtXAURQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Heap / GC Issues

Ned Campion
Did I, sorry, 16GB across 100 indices

On Friday, July 18, 2014, Mark Walkom <[hidden email]> wrote:
You said you have one index of 16GB but that you have ~100 indexes in total. How much data across all those indexes, the cluster total in GB/TB.


Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: <a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;markw@campaignmonitor.com&#39;);" target="_blank">markw@...
web: www.campaignmonitor.com


On 19 July 2014 11:25, Ned Campion <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;nedcampion@gmail.com&#39;);" target="_blank">nedcampion@...> wrote:
5 data nodes as described, and 2 master nodes, 50 clients connected directly, that's it for docements 448k, hope that's what you mean 

Thank you for the help 

On Friday, July 18, 2014, Mark Walkom <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;markw@campaignmonitor.com&#39;);" target="_blank">markw@...> wrote:
How much in total, in the entire cluster.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [hidden email]
web: www.campaignmonitor.com


On 19 July 2014 10:55, Ned Campion <[hidden email]> wrote:
Hey Mark,

The index is currently 16GB with 10 indices (added and removed daily, updated at a fairly low but constant pace) and just 448k documents between them (w/ geoshape + terms indexes)

Thanks for any and all help! and further info if desired

Best,
Ned


On Fri, Jul 18, 2014 at 8:36 PM, Mark Walkom <[hidden email]> wrote:
How many indexes and how much data do you have?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [hidden email]
web: www.campaignmonitor.com


On 19 July 2014 01:09, Ned Campion <[hidden email]> wrote:
Hey All,

I've got a cluster with 5 data nodes (2 master nodes). The cluster has ~100 indices, w/ doc counts in the 1k - 50k range. There is a low/medium amount of index load going into the cluster via the bulk api and a large amount of search traffic going in in the 40K queries per second range.

I'm running these data nodes on ec2 (c3.8xl's) with a 30GB heap, though at the time of the following sample I was testing out running with a 20GB heap. The process runs well for a while, a couple hours to a day or two depending on traffic, and then it get's into a bad state where there is continual doing long gc runs, ie every minute doing a stop the world run for 30-45sec, and seemingly getting very little out of it (ie starting with 18.8GB heap usage and going to 18.3GB heap usage).

Here the red line is a data node that is exhibiting the behavior. This is a graph of the "old" generation growing to nearly the complete heap size and then staying there for hours. During this time the application is severely degraded.


Example of one of the gc runs during this time (again they run every minute or so).

[2014-07-18 00:24:24,735][WARN ][monitor.jvm ] [prod-targeting-es2] [gc][old][10799][27] duration [41.5s], collections [1]/[42.5s], total [41.5s]/[2.2m], memory [18.8gb]->[18.3gb]/[19.8gb], all_pools {[young] [733.2mb]->[249.9mb]/[1.4gb]}{[survivor] [86mb]->[0b]/[191.3mb]}{[old] [18gb]->[18.1gb]/[18.1gb]}

We are running es 1.2.2 . We had been running Oracle 7u25 and we've tried upgrading to 7u65 with no effect. I just did a heap dump analysis using jmap and Eclipse Memory Analyzer and found that 85% of the heap was taken up with filter cache



We are doing a lot of "bool" conditions in our queries, so that may be a factor in the hefty filter cache.

Any ideas out there? Right now I have to bounce my data nodes every hour or two to ensure I don't reach this degraded state.


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/BS4H3qoFm8k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ajsDAZMbKxR1%2Bck149OjJrgfhwDqsATbT9PFpZCkbHPA%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMjDTzGo3k1cfiZkHnM-WGD-6spOGMSigEyhbPiK3u%2BCeoLq7A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/BS4H3qoFm8k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YtZxNCsxu_p%3DU4Rbzi54uTrHTtSwOMsjX%2BKQBZ2kBzew%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;elasticsearch%2Bunsubscribe@googlegroups.com&#39;);" target="_blank">elasticsearch+unsubscribe@....
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMjDTzGzD7yVeefweYwq7YWQsmEvghgmDLDt%3DKq0YCNmppqkQw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/BS4H3qoFm8k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to <a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;elasticsearch%2Bunsubscribe@googlegroups.com&#39;);" target="_blank">elasticsearch+unsubscribe@....
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624Z2KPqft0evkzQOTHT-uD3V2v8fXCX%3DHiNr7o0OtXAURQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMjDTzEfS0diHZAK5tu7yvbHKf9WWHyHg0A8tN9XWtGY_mzhfw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Heap / GC Issues

joergprante@gmail.com
In reply to this post by Ned Campion
You are on the right track and you found already the answer to your question, examine your queries. They seem to be cached and eat your heap.


Jörg


On Fri, Jul 18, 2014 at 5:09 PM, Ned Campion <[hidden email]> wrote:


We are doing a lot of "bool" conditions in our queries, so that may be a factor in the hefty filter cache.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE06%2BdSouhX1-3xnmrenTDJbCzDOqD4Ma_NxCkFh5QQJw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Heap / GC Issues

Clinton Gormley-2
Your filter cache is only taking up 3GB of the heap, which fits with the default limit of 10% of heap space. So the filter cache is not at fault here.  

I would look at the two usual suspects:

* field data - how much space is this consuming? Try: 
    curl 'localhost:9200/_nodes/stats/indices/fielddata?fields=*&pretty'
* swap - is it completely disabled?

The swap is a common gotcha.  If you have any data in swap it will slow GCs down terribly, and give you the scenario that you are seeing.  See the docs for how to disable swap: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-configuration.html#setup-configuration-memory

(i prefer disabling swap completely, rather than relying on mlockall or swappiness)




On 19 July 2014 09:06, [hidden email] <[hidden email]> wrote:
You are on the right track and you found already the answer to your question, examine your queries. They seem to be cached and eat your heap.


Jörg


On Fri, Jul 18, 2014 at 5:09 PM, Ned Campion <[hidden email]> wrote:


We are doing a lot of "bool" conditions in our queries, so that may be a factor in the hefty filter cache.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE06%2BdSouhX1-3xnmrenTDJbCzDOqD4Ma_NxCkFh5QQJw%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPt3XKRnLmMMVxFppy%3D3BG%3DD1afTfXpeeOni%3DRkJNunKkTfh0A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.