es got blocked

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

es got blocked

Sisu Alexandru
Hello all,

Our es suddenly got  blocked for 15 minutes.  That means: it suddenly stopped handling search requests and also status requests:curl -XGET http://localhost:9200/_status.

ES version: 0.18.5
The size of the index is arround 200G.
One ES client.
20 shards. All on one single machine.
No mirrors. 
OS: centos 5.


This info was obtained by jstack. (jstack -F 27029 (pid of es)). It looks like all the threads got blocked (all 132 )?!

The size of the jstack output is much more bigger  I've copy pasted only some parts of it.

Any ideas?

Tnx in advance,

Alex



Reply | Threaded
Open this post in threaded view
|

Re: es got blocked

Paul Loy
Garbage Collection? How much memory are you giving each JVM? If it's a large amount and you haven't tuned your GC options on the JVM, this is a likely cause.

I don't suppose you had something monitoring JMX over that time period. If you did you'd be able to see if this was the issue if you notice the Heap Space Used dropping off.

Paul.

On Mon, Dec 12, 2011 at 8:00 AM, Sisu Alexandru <[hidden email]> wrote:
Hello all,

Our es suddenly got  blocked for 15 minutes.  That means: it suddenly stopped handling search requests and also status requests:curl -XGET http://localhost:9200/_status.

ES version: 0.18.5
The size of the index is arround 200G.
One ES client.
20 shards. All on one single machine.
No mirrors. 
OS: centos 5.


This info was obtained by jstack. (jstack -F 27029 (pid of es)). It looks like all the threads got blocked (all 132 )?!

The size of the jstack output is much more bigger  I've copy pasted only some parts of it.

Any ideas?

Tnx in advance,

Alex






--
---------------------------------------------
Paul Loy
[hidden email]
http://uk.linkedin.com/in/paulloy
Reply | Threaded
Open this post in threaded view
|

Re: es got blocked

Sisu Alexandru
Well I started up elastic search with Xms and Xmx set to 100G.
That should've been enough.

Monitoring though JMX sounds a good ideea. but how can you configure jmx options in es?
The only documentation that I found was here http://www.elasticsearch.org/guide/reference/modules/jmx.html but its not very clear to me what should I do.

Tnx,

Alex

On Mon, Dec 12, 2011 at 5:40 PM, Paul Loy <[hidden email]> wrote:
Garbage Collection? How much memory are you giving each JVM? If it's a large amount and you haven't tuned your GC options on the JVM, this is a likely cause.

I don't suppose you had something monitoring JMX over that time period. If you did you'd be able to see if this was the issue if you notice the Heap Space Used dropping off.

Paul.


On Mon, Dec 12, 2011 at 8:00 AM, Sisu Alexandru <[hidden email]> wrote:
Hello all,

Our es suddenly got  blocked for 15 minutes.  That means: it suddenly stopped handling search requests and also status requests:curl -XGET http://localhost:9200/_status.

ES version: 0.18.5
The size of the index is arround 200G.
One ES client.
20 shards. All on one single machine.
No mirrors. 
OS: centos 5.


This info was obtained by jstack. (jstack -F 27029 (pid of es)). It looks like all the threads got blocked (all 132 )?!

The size of the jstack output is much more bigger  I've copy pasted only some parts of it.

Any ideas?

Tnx in advance,

Alex






--
---------------------------------------------
Paul Loy
[hidden email]
http://uk.linkedin.com/in/paulloy

Reply | Threaded
Open this post in threaded view
|

Re: es got blocked

kimchy
Administrator
You configured ES with 100g? Do you have a machine with a 100gb of memory? How much memory does your machine has? I am surprise that it even started... (swap might be really big).

On Mon, Dec 12, 2011 at 11:12 PM, Sisu Alexandru <[hidden email]> wrote:
Well I started up elastic search with Xms and Xmx set to 100G.
That should've been enough.

Monitoring though JMX sounds a good ideea. but how can you configure jmx options in es?
The only documentation that I found was here http://www.elasticsearch.org/guide/reference/modules/jmx.html but its not very clear to me what should I do.

Tnx,

Alex


On Mon, Dec 12, 2011 at 5:40 PM, Paul Loy <[hidden email]> wrote:
Garbage Collection? How much memory are you giving each JVM? If it's a large amount and you haven't tuned your GC options on the JVM, this is a likely cause.

I don't suppose you had something monitoring JMX over that time period. If you did you'd be able to see if this was the issue if you notice the Heap Space Used dropping off.

Paul.


On Mon, Dec 12, 2011 at 8:00 AM, Sisu Alexandru <[hidden email]> wrote:
Hello all,

Our es suddenly got  blocked for 15 minutes.  That means: it suddenly stopped handling search requests and also status requests:curl -XGET http://localhost:9200/_status.

ES version: 0.18.5
The size of the index is arround 200G.
One ES client.
20 shards. All on one single machine.
No mirrors. 
OS: centos 5.


This info was obtained by jstack. (jstack -F 27029 (pid of es)). It looks like all the threads got blocked (all 132 )?!

The size of the jstack output is much more bigger  I've copy pasted only some parts of it.

Any ideas?

Tnx in advance,

Alex






--
---------------------------------------------
Paul Loy
[hidden email]
http://uk.linkedin.com/in/paulloy


Reply | Threaded
Open this post in threaded view
|

Re: es got blocked

Sisu Alexandru
Yes, the machine has 128 GB , 48 cores. And till now we didn't have no problems.
Anyhow, yesterday I restarted the machine, and everything got back to normal.
Still I'm interesting in how to configure the jmx for es. Any tips? :)

On Mon, Dec 12, 2011 at 11:43 PM, Shay Banon <[hidden email]> wrote:
You configured ES with 100g? Do you have a machine with a 100gb of memory? How much memory does your machine has? I am surprise that it even started... (swap might be really big).


On Mon, Dec 12, 2011 at 11:12 PM, Sisu Alexandru <[hidden email]> wrote:
Well I started up elastic search with Xms and Xmx set to 100G.
That should've been enough.

Monitoring though JMX sounds a good ideea. but how can you configure jmx options in es?
The only documentation that I found was here http://www.elasticsearch.org/guide/reference/modules/jmx.html but its not very clear to me what should I do.

Tnx,

Alex


On Mon, Dec 12, 2011 at 5:40 PM, Paul Loy <[hidden email]> wrote:
Garbage Collection? How much memory are you giving each JVM? If it's a large amount and you haven't tuned your GC options on the JVM, this is a likely cause.

I don't suppose you had something monitoring JMX over that time period. If you did you'd be able to see if this was the issue if you notice the Heap Space Used dropping off.

Paul.


On Mon, Dec 12, 2011 at 8:00 AM, Sisu Alexandru <[hidden email]> wrote:
Hello all,

Our es suddenly got  blocked for 15 minutes.  That means: it suddenly stopped handling search requests and also status requests:curl -XGET http://localhost:9200/_status.

ES version: 0.18.5
The size of the index is arround 200G.
One ES client.
20 shards. All on one single machine.
No mirrors. 
OS: centos 5.


This info was obtained by jstack. (jstack -F 27029 (pid of es)). It looks like all the threads got blocked (all 132 )?!

The size of the jstack output is much more bigger  I've copy pasted only some parts of it.

Any ideas?

Tnx in advance,

Alex






--
---------------------------------------------
Paul Loy
[hidden email]
http://uk.linkedin.com/in/paulloy



Reply | Threaded
Open this post in threaded view
|

Re: es got blocked

Paul Loy
That's some machine!

Yeah, with stutters like this it's very useful to know what's going on with your heap (and other resources). You can watch the heap via JConsole or use some monitoring tool like traverse to fire emails when the heap gets too big or simply be able to view historical graphs of all your JMX exposed variables.

To enable JMX it looks like you need jmx.create_connector: true in your elasticsearch.yml.

VisualVM also has an awesome Visual GC plugin that lets you see which of the various sections of your Heap are filling up.

Cheers,

Paul.


On Tue, Dec 13, 2011 at 12:54 AM, Sisu Alexandru <[hidden email]> wrote:
Yes, the machine has 128 GB , 48 cores. And till now we didn't have no problems.
Anyhow, yesterday I restarted the machine, and everything got back to normal.
Still I'm interesting in how to configure the jmx for es. Any tips? :)

On Mon, Dec 12, 2011 at 11:43 PM, Shay Banon <[hidden email]> wrote:
You configured ES with 100g? Do you have a machine with a 100gb of memory? How much memory does your machine has? I am surprise that it even started... (swap might be really big).


On Mon, Dec 12, 2011 at 11:12 PM, Sisu Alexandru <[hidden email]> wrote:
Well I started up elastic search with Xms and Xmx set to 100G.
That should've been enough.

Monitoring though JMX sounds a good ideea. but how can you configure jmx options in es?
The only documentation that I found was here http://www.elasticsearch.org/guide/reference/modules/jmx.html but its not very clear to me what should I do.

Tnx,

Alex


On Mon, Dec 12, 2011 at 5:40 PM, Paul Loy <[hidden email]> wrote:
Garbage Collection? How much memory are you giving each JVM? If it's a large amount and you haven't tuned your GC options on the JVM, this is a likely cause.

I don't suppose you had something monitoring JMX over that time period. If you did you'd be able to see if this was the issue if you notice the Heap Space Used dropping off.

Paul.


On Mon, Dec 12, 2011 at 8:00 AM, Sisu Alexandru <[hidden email]> wrote:
Hello all,

Our es suddenly got  blocked for 15 minutes.  That means: it suddenly stopped handling search requests and also status requests:curl -XGET http://localhost:9200/_status.

ES version: 0.18.5
The size of the index is arround 200G.
One ES client.
20 shards. All on one single machine.
No mirrors. 
OS: centos 5.


This info was obtained by jstack. (jstack -F 27029 (pid of es)). It looks like all the threads got blocked (all 132 )?!

The size of the jstack output is much more bigger  I've copy pasted only some parts of it.

Any ideas?

Tnx in advance,

Alex






--
---------------------------------------------
Paul Loy
[hidden email]
http://uk.linkedin.com/in/paulloy






--
---------------------------------------------
Paul Loy
[hidden email]
http://uk.linkedin.com/in/paulloy
Reply | Threaded
Open this post in threaded view
|

Re: es got blocked

kimchy
Administrator
Also, I would add that you make sure to enable mlockall in the configuration to make sure the OS will not swap the elasticsearch process. I never ran ES with 100gb of memory, whats your typical memory usage? (node stats can give you a lot of information, also on the jvm level).

On Tue, Dec 13, 2011 at 12:17 PM, Paul Loy <[hidden email]> wrote:
That's some machine!

Yeah, with stutters like this it's very useful to know what's going on with your heap (and other resources). You can watch the heap via JConsole or use some monitoring tool like traverse to fire emails when the heap gets too big or simply be able to view historical graphs of all your JMX exposed variables.

To enable JMX it looks like you need jmx.create_connector: true in your elasticsearch.yml.

VisualVM also has an awesome Visual GC plugin that lets you see which of the various sections of your Heap are filling up.

Cheers,

Paul.



On Tue, Dec 13, 2011 at 12:54 AM, Sisu Alexandru <[hidden email]> wrote:
Yes, the machine has 128 GB , 48 cores. And till now we didn't have no problems.
Anyhow, yesterday I restarted the machine, and everything got back to normal.
Still I'm interesting in how to configure the jmx for es. Any tips? :)

On Mon, Dec 12, 2011 at 11:43 PM, Shay Banon <[hidden email]> wrote:
You configured ES with 100g? Do you have a machine with a 100gb of memory? How much memory does your machine has? I am surprise that it even started... (swap might be really big).


On Mon, Dec 12, 2011 at 11:12 PM, Sisu Alexandru <[hidden email]> wrote:
Well I started up elastic search with Xms and Xmx set to 100G.
That should've been enough.

Monitoring though JMX sounds a good ideea. but how can you configure jmx options in es?
The only documentation that I found was here http://www.elasticsearch.org/guide/reference/modules/jmx.html but its not very clear to me what should I do.

Tnx,

Alex


On Mon, Dec 12, 2011 at 5:40 PM, Paul Loy <[hidden email]> wrote:
Garbage Collection? How much memory are you giving each JVM? If it's a large amount and you haven't tuned your GC options on the JVM, this is a likely cause.

I don't suppose you had something monitoring JMX over that time period. If you did you'd be able to see if this was the issue if you notice the Heap Space Used dropping off.

Paul.


On Mon, Dec 12, 2011 at 8:00 AM, Sisu Alexandru <[hidden email]> wrote:
Hello all,

Our es suddenly got  blocked for 15 minutes.  That means: it suddenly stopped handling search requests and also status requests:curl -XGET http://localhost:9200/_status.

ES version: 0.18.5
The size of the index is arround 200G.
One ES client.
20 shards. All on one single machine.
No mirrors. 
OS: centos 5.


This info was obtained by jstack. (jstack -F 27029 (pid of es)). It looks like all the threads got blocked (all 132 )?!

The size of the jstack output is much more bigger  I've copy pasted only some parts of it.

Any ideas?

Tnx in advance,

Alex






--
---------------------------------------------
Paul Loy
[hidden email]
http://uk.linkedin.com/in/paulloy






--
---------------------------------------------
Paul Loy
[hidden email]
http://uk.linkedin.com/in/paulloy

Reply | Threaded
Open this post in threaded view
|

Re: es got blocked

Sisu Alexandru
Hello all,

Following your suggestion I've tried:
- running with bootstrap.mlockall set to True.
- i've enabled the JMX monitoring.

The es continues to hangs. I can reproduce the problem again and again, by executing the following query:
{  "facets": {   "term_count": {      "global": true,      "terms": {        "field": "body",        "size": 100      }    }  },  "size": 0}

(I'm trying to retrieve the most common 100 terms for this field).

To my surprise, I've discovered in the es folder, a set of huge files: java_pid_xxx.hprof. Generated each time my es got 'blocked'.
Now, this files are generated when the process runs out of memory. 

I'm running with 100G of heap memory allocated, and I expect that for large memory consuming operations the memory to be swapped.

Anyhow, it seems that on my machine, each time I'm running the above query, I'm killing it.

Other informations:
- the jhat heap histogram can be found here: https://gist.github.com/1497379
- through JConsole, each time I'm executing this query, I can see how the heap memory increases !very fast! it seems that 100G are consumed in a few seconds.
- regarding the node stats and cluster stats es provides: I cannot access them after es dies, but here is the info of es, right after restart when everything is okey:
os: {
  refresh_interval: 1000
  cpu: {
  vendor: AMD
  model: Opteron
  mhz: 1900
  total_cores: 48
  total_sockets: 4
  cores_per_socket: 12
  cache_size: 512b
  cache_size_in_bytes: 512
  }
  mem: {
  total: 126gb
  total_in_bytes: 135321870336
  }
  swap: {
  total: 64gb
  total_in_bytes: 68803354624
  }
  }
  process: {
  refresh_interval: 1000
  id: 15926
  max_file_descriptors: 128000
  }
  jvm: {
  pid: 15926
  version: 1.6.0_27
  vm_name: Java HotSpot(TM) 64-Bit Server VM
  vm_version: 20.2-b06
  vm_vendor: Sun Microsystems Inc.
  start_time: 1324275713418
  mem: {
  heap_init: 100gb
  heap_init_in_bytes: 107374182400
  heap_max: 99.9gb
  heap_max_in_bytes: 107302223872
  non_heap_init: 23.1mb
  non_heap_init_in_bytes: 24313856
  non_heap_max: 130mb
  non_heap_max_in_bytes: 136314880
  }
  }



Any suggestions?

Tnx,
Alex


On Tue, Dec 13, 2011 at 11:23 PM, Shay Banon <[hidden email]> wrote:
Also, I would add that you make sure to enable mlockall in the configuration to make sure the OS will not swap the elasticsearch process. I never ran ES with 100gb of memory, whats your typical memory usage? (node stats can give you a lot of information, also on the jvm level).


On Tue, Dec 13, 2011 at 12:17 PM, Paul Loy <[hidden email]> wrote:
That's some machine!

Yeah, with stutters like this it's very useful to know what's going on with your heap (and other resources). You can watch the heap via JConsole or use some monitoring tool like traverse to fire emails when the heap gets too big or simply be able to view historical graphs of all your JMX exposed variables.

To enable JMX it looks like you need jmx.create_connector: true in your elasticsearch.yml.

VisualVM also has an awesome Visual GC plugin that lets you see which of the various sections of your Heap are filling up.

Cheers,

Paul.



On Tue, Dec 13, 2011 at 12:54 AM, Sisu Alexandru <[hidden email]> wrote:
Yes, the machine has 128 GB , 48 cores. And till now we didn't have no problems.
Anyhow, yesterday I restarted the machine, and everything got back to normal.
Still I'm interesting in how to configure the jmx for es. Any tips? :)

On Mon, Dec 12, 2011 at 11:43 PM, Shay Banon <[hidden email]> wrote:
You configured ES with 100g? Do you have a machine with a 100gb of memory? How much memory does your machine has? I am surprise that it even started... (swap might be really big).


On Mon, Dec 12, 2011 at 11:12 PM, Sisu Alexandru <[hidden email]> wrote:
Well I started up elastic search with Xms and Xmx set to 100G.
That should've been enough.

Monitoring though JMX sounds a good ideea. but how can you configure jmx options in es?
The only documentation that I found was here http://www.elasticsearch.org/guide/reference/modules/jmx.html but its not very clear to me what should I do.

Tnx,

Alex


On Mon, Dec 12, 2011 at 5:40 PM, Paul Loy <[hidden email]> wrote:
Garbage Collection? How much memory are you giving each JVM? If it's a large amount and you haven't tuned your GC options on the JVM, this is a likely cause.

I don't suppose you had something monitoring JMX over that time period. If you did you'd be able to see if this was the issue if you notice the Heap Space Used dropping off.

Paul.


On Mon, Dec 12, 2011 at 8:00 AM, Sisu Alexandru <[hidden email]> wrote:
Hello all,

Our es suddenly got  blocked for 15 minutes.  That means: it suddenly stopped handling search requests and also status requests:curl -XGET http://localhost:9200/_status.

ES version: 0.18.5
The size of the index is arround 200G.
One ES client.
20 shards. All on one single machine.
No mirrors. 
OS: centos 5.


This info was obtained by jstack. (jstack -F 27029 (pid of es)). It looks like all the threads got blocked (all 132 )?!

The size of the jstack output is much more bigger  I've copy pasted only some parts of it.

Any ideas?

Tnx in advance,

Alex






--
---------------------------------------------
Paul Loy
[hidden email]
http://uk.linkedin.com/in/paulloy






--
---------------------------------------------
Paul Loy
[hidden email]
http://uk.linkedin.com/in/paulloy


Reply | Threaded
Open this post in threaded view
|

Re: es got blocked

Aurélien-2
Hello.

hprof files are generated by -XX:+HeapDumpOnOutOfMemoryError parameter. The files should be roughly same size as heap size when OOM occurs, but could you give us the size of those files?

jhat profiles does not show a 100GB used heap, but do you have any OOM error in log files? I don't know if you modified launch scripts, but theses errors should be redirected via stderr. Do you have a huge CPU consommation on 1 or more CPU during the "es blocked" situation?

It should be interesting to:
- redirect stderr and stdout to a file if it's not done already (with &> /path/to/file.log at the end of java command file)
- activate verbose gc to a specific file
- launch visualvm with visualgc plugin
- launch recurrent thread dumps (with kill -3 <pid>) within short time period (one thread dump every 10 or 15 seconds)

and then reproduce the problem asap to avoid generating to much logs.

If you have an OOM, you will wich generation is full with the verbosegc and visualgc, if you have a problem with threads you will see them in thread dumps (I use samurai http://yusuke.homeip.net/samurai/ to analyze thread dumps), MAT to analyze heap dump/hprof files (but huge hprof should be difficult to read, maybe with jhat). Samurai can read verbosegc files too.

In any case, try to use a more recent JVM, please send the complete command line of your ES, and maybe try another JDK (like jrockit with either generation/non generational GC).

100GB is a quite huge memory for actual GC, see http://www.infoq.com/presentations/Understanding-Java-Garbage-Collection

Rgds.

De: "Sisu Alexandru" <[hidden email]>
À: [hidden email]
Envoyé: Lundi 19 Décembre 2011 15:11:30
Objet: Re: es got blocked

Hello all,

Following your suggestion I've tried:
- running with bootstrap.mlockall set to True.
- i've enabled the JMX monitoring.

The es continues to hangs. I can reproduce the problem again and again, by executing the following query:
{  "facets": {   "term_count": {      "global": true,      "terms": {        "field": "body",        "size": 100      }    }  },  "size": 0}

(I'm trying to retrieve the most common 100 terms for this field).

To my surprise, I've discovered in the es folder, a set of huge files: java_pid_xxx.hprof.. Generated each time my es got 'blocked'.
Now, this files are generated when the process runs out of memory. 

I'm running with 100G of heap memory allocated, and I expect that for large memory consuming operations the memory to be swapped.

Anyhow, it seems that on my machine, each time I'm running the above query, I'm killing it.

Other informations:
- the jhat heap histogram can be found here: https://gist.github.com/1497379
- through JConsole, each time I'm executing this query, I can see how the heap memory increases !very fast! it seems that 100G are consumed in a few seconds.
- regarding the node stats and cluster stats es provides: I cannot access them after es dies, but here is the info of es, right after restart when everything is okey:
os: {
  refresh_interval: 1000
  cpu: {
  vendor: AMD
  model: Opteron
  mhz: 1900
  total_cores: 48
  total_sockets: 4
  cores_per_socket: 12
  cache_size: 512b
  cache_size_in_bytes: 512
  }
  mem: {
  total: 126gb
  total_in_bytes: 135321870336
  }
  swap: {
  total: 64gb
  total_in_bytes: 68803354624
  }
  }
  process: {
  refresh_interval: 1000
  id: 15926
  max_file_descriptors: 128000
  }
  jvm: {
  pid: 15926
  version: 1.6..0_27
  vm_name: Java HotSpot(TM) 64-Bit Server VM
  vm_version: 20.2-b06
  vm_vendor: Sun Microsystems Inc.
  start_time: 1324275713418
  mem: {
  heap_init: 100gb
  heap_init_in_bytes: 107374182400
  heap_max: 99.9gb
  heap_max_in_bytes: 107302223872
  non_heap_init: 23.1mb
  non_heap_init_in_bytes: 24313856
  non_heap_max: 130mb
  non_heap_max_in_bytes: 136314880
  }
  }



Any suggestions?

Tnx,
Alex


On Tue, Dec 13, 2011 at 11:23 PM, Shay Banon <[hidden email]> wrote:
Also, I would add that you make sure to enable mlockall in the configuration to make sure the OS will not swap the elasticsearch process. I never ran ES with 100gb of memory, whats your typical memory usage? (node stats can give you a lot of information, also on the jvm level).


On Tue, Dec 13, 2011 at 12:17 PM, Paul Loy <[hidden email]> wrote:
That's some machine!

Yeah, with stutters like this it's very useful to know what's going on with your heap (and other resources). You can watch the heap via JConsole or use some monitoring tool like traverse to fire emails when the heap gets too big or simply be able to view historical graphs of all your JMX exposed variables.

To enable JMX it looks like you need jmx.create_connector: true in your elasticsearch.yml.

VisualVM also has an awesome Visual GC plugin that lets you see which of the various sections of your Heap are filling up.

Cheers,

Paul.



On Tue, Dec 13, 2011 at 12:54 AM, Sisu Alexandru <[hidden email]> wrote:
Yes, the machine has 128 GB , 48 cores. And till now we didn't have no problems.
Anyhow, yesterday I restarted the machine, and everything got back to normal.
Still I'm interesting in how to configure the jmx for es. Any tips? :)

On Mon, Dec 12, 2011 at 11:43 PM, Shay Banon <[hidden email]> wrote:
You configured ES with 100g? Do you have a machine with a 100gb of memory? How much memory does your machine has? I am surprise that it even started.... (swap might be really big).


On Mon, Dec 12, 2011 at 11:12 PM, Sisu Alexandru <[hidden email]> wrote:
Well I started up elastic search with Xms and Xmx set to 100G.
That should've been enough.

Monitoring though JMX sounds a good ideea. but how can you configure jmx options in es?
The only documentation that I found was here http://www.elasticsearch.org/guide/reference/modules/jmx.html but its not very clear to me what should I do.

Tnx,

Alex


On Mon, Dec 12, 2011 at 5:40 PM, Paul Loy <[hidden email]> wrote:
Garbage Collection? How much memory are you giving each JVM? If it's a large amount and you haven't tuned your GC options on the JVM, this is a likely cause.

I don't suppose you had something monitoring JMX over that time period. If you did you'd be able to see if this was the issue if you notice the Heap Space Used dropping off.

Paul.


On Mon, Dec 12, 2011 at 8:00 AM, Sisu Alexandru <[hidden email]> wrote:
Hello all,

Our es suddenly got  blocked for 15 minutes..  That means: it suddenly stopped handling search requests and also status requests:curl -XGET http://localhost:9200/_status.

ES version: 0.18.5
The size of the index is arround 200G.
One ES client.
20 shards. All on one single machine.
No mirrors. 
OS: centos 5.


This info was obtained by jstack. (jstack -F 27029 (pid of es)). It looks like all the threads got blocked (all 132 )?!

The size of the jstack output is much more bigger  I've copy pasted only some parts of it.

Any ideas?

Tnx in advance,

Alex






--
---------------------------------------------
Paul Loy
[hidden email]
http://uk.linkedin.com/in/paulloy






--
---------------------------------------------
Paul Loy
[hidden email]
http://uk.linkedin.com/in/paulloy


Reply | Threaded
Open this post in threaded view
|

Re: es got blocked

Aurélien-2
In reply to this post by Sisu Alexandru
Hello.

hprof files are generated by -XX:+HeapDumpOnOutOfMemoryError
parameter. The files should be roughly same size as heap size when OOM
occurs, but could you give us the size of those files?

jhat profiles does not show a 100GB used heap, but do you have any OOM
error in log files? I don't know if you modified launch scripts, but
theses errors should be redirected via stderr. Do you have a huge CPU
consommation on 1 or more CPU during the "es blocked" situation?

It should be interesting to:
- redirect stderr and stdout to a file if it's not done already (with
&> /path/to/file.log at the end of java command file)
- activate verbose gc to a specific file
- launch visualvm with visualgc plugin
- launch recurrent thread dumps (with kill -3 <pid>) within short time
period (one thread dump every 10 or 15 seconds)

and then reproduce the problem asap to avoid generating to much logs.

If you have an OOM, you will wich generation is full with the
verbosegc and visualgc, if you have a problem with threads you will
see them in thread dumps (I use samurai http://yusuke.homeip.net/samurai/
to analyze thread dumps), MAT to analyze heap dump/hprof files (but
huge hprof should be difficult to read, maybe with jhat). Samurai can
read verbosegc files too.

In any case, try to use a more recent JVM, please send the complete
command line of your ES, and maybe try another JDK (like jrockit with
either generation/non generational GC).

100GB is a quite huge memory for actual GC, see
http://www.infoq.com/presentations/Understanding-Java-Garbage-Collection

Rgds.

    De: "Sisu Alexandru" <[hidden email]>
    À: [hidden email]
    Envoyé: Lundi 19 Décembre 2011 15:11:30
    Objet: Re: es got blocked

    Hello all,
    Following your suggestion I've tried:
    - running with bootstrap.mlockall set to True.
    - i've enabled the JMX monitoring.
    The es continues to hangs. I can reproduce the problem again and
again, by executing the following query:
    {  "facets": {   "term_count": {      "global": true,
"terms": {        "field": "body",        "size": 100      }    }  },
"size": 0}
    (I'm trying to retrieve the most common 100 terms for this field).
    To my surprise, I've discovered in the es folder, a set of huge
files: java_pid_xxx.hprof.. Generated each time my es got 'blocked'.
    Now, this files are generated when the process runs out of
memory.
    I'm running with 100G of heap memory allocated, and I expect that
for large memory consuming operations the memory to be swapped.
    Anyhow, it seems that on my machine, each time I'm running the
above query, I'm killing it.
    Other informations:
    - the jhat heap histogram can be found here: https://gist.github.com/1497379
    - through JConsole, each time I'm executing this query, I can see
how the heap memory increases !very fast! it seems that 100G are
consumed in a few seconds.
    - regarding the node stats and cluster stats es provides: I cannot
access them after es dies, but here is the info of es, right after
restart when everything is okey:
    os: {
      refresh_interval: 1000
      cpu: {
      vendor: AMD
      model: Opteron
      mhz: 1900
      total_cores: 48
      total_sockets: 4
      cores_per_socket: 12
      cache_size: 512b
      cache_size_in_bytes: 512
      }
      mem: {
      total: 126gb
      total_in_bytes: 135321870336
      }
      swap: {
      total: 64gb
      total_in_bytes: 68803354624
      }
      }
      process: {
      refresh_interval: 1000
      id: 15926
      max_file_descriptors: 128000
      }
      jvm: {
      pid: 15926
      version: 1.6..0_27
      vm_name: Java HotSpot(TM) 64-Bit Server VM
      vm_version: 20.2-b06
      vm_vendor: Sun Microsystems Inc.
      start_time: 1324275713418
      mem: {
      heap_init: 100gb
      heap_init_in_bytes: 107374182400
      heap_max: 99.9gb
      heap_max_in_bytes: 107302223872
      non_heap_init: 23.1mb
      non_heap_init_in_bytes: 24313856
      non_heap_max: 130mb
      non_heap_max_in_bytes: 136314880
      }
      }
    Any suggestions?
    Tnx,
    Alex
    On Tue, Dec 13, 2011 at 11:23 PM, Shay Banon <[hidden email]>
wrote:

        Also, I would add that you make sure to enable mlockall in the
configuration to make sure the OS will not swap the elasticsearch
process. I never ran ES with 100gb of memory, whats your typical
memory usage? (node stats can give you a lot of information, also on
the jvm level).


        On Tue, Dec 13, 2011 at 12:17 PM, Paul Loy
<[hidden email]> wrote:

            That's some machine!

            Yeah, with stutters like this it's very useful to know
what's going on with your heap (and other resources). You can watch
the heap via JConsole or use some monitoring tool like traverse to
fire emails when the heap gets too big or simply be able to view
historical graphs of all your JMX exposed variables.

            To enable JMX it looks like you need jmx.create_connector:
true in your elasticsearch.yml.

            VisualVM also has an awesome Visual GC plugin that lets
you see which of the various sections of your Heap are filling up.

            Cheers,

            Paul.



            On Tue, Dec 13, 2011 at 12:54 AM, Sisu Alexandru
<[hidden email]> wrote:

                Yes, the machine has 128 GB , 48 cores. And till now
we didn't have no problems.
                Anyhow, yesterday I restarted the machine, and
everything got back to normal.
                Still I'm interesting in how to configure the jmx for
es. Any tips? :)
                On Mon, Dec 12, 2011 at 11:43 PM, Shay Banon
<[hidden email]> wrote:

                    You configured ES with 100g? Do you have a machine
with a 100gb of memory? How much memory does your machine has? I am
surprise that it even started.... (swap might be really big).


                    On Mon, Dec 12, 2011 at 11:12 PM, Sisu Alexandru
<[hidden email]> wrote:

                        Well I started up elastic search with Xms and
Xmx set to 100G.
                        That should've been enough.
                        Monitoring though JMX sounds a good ideea. but
how can you configure jmx options in es?
                        The only documentation that I found was here
http://www.elasticsearch.org/guide/reference/modules/jmx.html but its
not very clear to me what should I do.
                        Tnx,
                        Alex


                        On Mon, Dec 12, 2011 at 5:40 PM, Paul Loy
<[hidden email]> wrote:

                            Garbage Collection? How much memory are
you giving each JVM? If it's a large amount and you haven't tuned your
GC options on the JVM, this is a likely cause.

                            I don't suppose you had something
monitoring JMX over that time period. If you did you'd be able to see
if this was the issue if you notice the Heap Space Used dropping off.

                            Paul.


                            On Mon, Dec 12, 2011 at 8:00 AM, Sisu
Alexandru <[hidden email]> wrote:

                                Hello all,
                                Our es suddenly got  blocked for 15
minutes..  That means: it suddenly stopped handling search requests
and also status requests:curl -XGET http://localhost:9200/_status.
                                ES version: 0.18.5
                                The size of the index is arround 200G.
                                One ES client.
                                20 shards. All on one single machine.
                                No mirrors.
                                OS: centos 5.
                                Here is the gist: https://gist.github.com/1467971
                                This info was obtained by jstack.
(jstack -F 27029 (pid of es)). It looks like all the threads got
blocked (all 132 )?!
                                The size of the jstack output is much
more bigger  I've copy pasted only some parts of it.
                                Any ideas?
                                Tnx in advance,
                                Alex




                            --
 
---------------------------------------------
                            Paul Loy
                            [hidden email]
                            http://uk.linkedin.com/in/paulloy




            --
            ---------------------------------------------
            Paul Loy
            [hidden email]
            http://uk.linkedin.com/in/paulloy

Reply | Threaded
Open this post in threaded view
|

Re: es got blocked

Sisu Alexandru
Hi Aurelien

Tnx for the prompt answer. 
I run es with in foreground so that I can see the eventual OOM. And indeed it turned out to be an OOM:
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid29451.hprof ...
Exception in thread "elasticsearch[search]-pool-3-thread-18" java.lang.OutOfMemoryError: Java heap space
at org.elasticsearch.index.field.data.support.FieldDataLoader.load(FieldDataLoader.java:61)
at org.elasticsearch.index.field.data.strings.StringFieldData.load(StringFieldData.java:84)
at org.elasticsearch.index.field.data.strings.StringFieldDataType.load(StringFieldDataType.java:52)
at org.elasticsearch.index.field.data.strings.StringFieldDataType.load(StringFieldDataType.java:34)
at org.elasticsearch.index.field.data.FieldData.load(FieldData.java:110)
at org.elasticsearch.index.cache.field.data.support.AbstractConcurrentMapFieldDataCache.cache(AbstractConcurrentMapFieldDataCache.java:119)
at org.elasticsearch.search.facet.terms.strings.TermsStringOrdinalsFacetCollector.doSetNextReader(TermsStringOrdinalsFacetCollector.java:127)
at org.elasticsearch.search.facet.AbstractFacetCollector.setNextReader(AbstractFacetCollector.java:71)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:576)
at org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:199)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:383)

Command line params:

home/jdk1.6.0_27/bin/java -Xms256m -Xmx1g -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch -Des.path.home=/home/elasticsearch-0.18.5 -Des-foreground=yes -cp :/home/elasticsearch-0.18.5/lib/*:/home/elasticsearch-0.18.5/lib/sigar/* -Xms100G -Xmx100G org.elasticsearch.bootstrap.ElasticSearch

The dumps size of hprof file:
- currently is 70GB and is increasing (slower). Probably it will reach 100 gb. 

What I did right after this was to run on my development machine the same query on a smaller dataset.
It seems that ResidentFieldDataCache that extends the (AbstractConcurrentMapFieldDataCache) doesnt get cleared? (I tried to put a breakpoint on clear method , and it never gets called).
On the other hand, I also didn't configured my index with no caching options.

I'm working now on getting the thread dumps and the verbose output of gc.



On Mon, Dec 19, 2011 at 3:37 PM, Aurélien <[hidden email]> wrote:
Hello.

hprof files are generated by -XX:+HeapDumpOnOutOfMemoryError
parameter. The files should be roughly same size as heap size when OOM
occurs, but could you give us the size of those files?

jhat profiles does not show a 100GB used heap, but do you have any OOM
error in log files? I don't know if you modified launch scripts, but
theses errors should be redirected via stderr. Do you have a huge CPU
consommation on 1 or more CPU during the "es blocked" situation?

It should be interesting to:
- redirect stderr and stdout to a file if it's not done already (with
&> /path/to/file.log at the end of java command file)
- activate verbose gc to a specific file
- launch visualvm with visualgc plugin
- launch recurrent thread dumps (with kill -3 <pid>) within short time
period (one thread dump every 10 or 15 seconds)

and then reproduce the problem asap to avoid generating to much logs.

If you have an OOM, you will wich generation is full with the
verbosegc and visualgc, if you have a problem with threads you will
see them in thread dumps (I use samurai http://yusuke.homeip.net/samurai/
to analyze thread dumps), MAT to analyze heap dump/hprof files (but
huge hprof should be difficult to read, maybe with jhat). Samurai can
read verbosegc files too.

In any case, try to use a more recent JVM, please send the complete
command line of your ES, and maybe try another JDK (like jrockit with
either generation/non generational GC).

100GB is a quite huge memory for actual GC, see
http://www.infoq.com/presentations/Understanding-Java-Garbage-Collection

Rgds.

   De: "Sisu Alexandru" <[hidden email]>
   À: [hidden email]
   Envoyé: Lundi 19 Décembre 2011 15:11:30
   Objet: Re: es got blocked

   Hello all,
   Following your suggestion I've tried:
   - running with bootstrap.mlockall set to True.
   - i've enabled the JMX monitoring.
   The es continues to hangs. I can reproduce the problem again and
again, by executing the following query:
   {  "facets": {   "term_count": {      "global": true,
"terms": {        "field": "body",        "size": 100      }    }  },
"size": 0}
   (I'm trying to retrieve the most common 100 terms for this field).
   To my surprise, I've discovered in the es folder, a set of huge
files: java_pid_xxx.hprof.. Generated each time my es got 'blocked'.
   Now, this files are generated when the process runs out of
memory.
   I'm running with 100G of heap memory allocated, and I expect that
for large memory consuming operations the memory to be swapped.
   Anyhow, it seems that on my machine, each time I'm running the
above query, I'm killing it.
   Other informations:
   - the jhat heap histogram can be found here: https://gist.github.com/1497379
   - through JConsole, each time I'm executing this query, I can see
how the heap memory increases !very fast! it seems that 100G are
consumed in a few seconds.
   - regarding the node stats and cluster stats es provides: I cannot
access them after es dies, but here is the info of es, right after
restart when everything is okey:
   os: {
     refresh_interval: 1000
     cpu: {
     vendor: AMD
     model: Opteron
     mhz: 1900
     total_cores: 48
     total_sockets: 4
     cores_per_socket: 12
     cache_size: 512b
     cache_size_in_bytes: 512
     }
     mem: {
     total: 126gb
     total_in_bytes: 135321870336
     }
     swap: {
     total: 64gb
     total_in_bytes: 68803354624
     }
     }
     process: {
     refresh_interval: 1000
     id: 15926
     max_file_descriptors: 128000
     }
     jvm: {
     pid: 15926
     version: 1.6..0_27
     vm_name: Java HotSpot(TM) 64-Bit Server VM
     vm_version: 20.2-b06
     vm_vendor: Sun Microsystems Inc.
     start_time: 1324275713418
     mem: {
     heap_init: 100gb
     heap_init_in_bytes: 107374182400
     heap_max: 99.9gb
     heap_max_in_bytes: 107302223872
     non_heap_init: 23.1mb
     non_heap_init_in_bytes: 24313856
     non_heap_max: 130mb
     non_heap_max_in_bytes: 136314880
     }
     }
   Any suggestions?
   Tnx,
   Alex
   On Tue, Dec 13, 2011 at 11:23 PM, Shay Banon <[hidden email]>
wrote:

       Also, I would add that you make sure to enable mlockall in the
configuration to make sure the OS will not swap the elasticsearch
process. I never ran ES with 100gb of memory, whats your typical
memory usage? (node stats can give you a lot of information, also on
the jvm level).


       On Tue, Dec 13, 2011 at 12:17 PM, Paul Loy
<[hidden email]> wrote:

           That's some machine!

           Yeah, with stutters like this it's very useful to know
what's going on with your heap (and other resources). You can watch
the heap via JConsole or use some monitoring tool like traverse to
fire emails when the heap gets too big or simply be able to view
historical graphs of all your JMX exposed variables.

           To enable JMX it looks like you need jmx.create_connector:
true in your elasticsearch.yml.

           VisualVM also has an awesome Visual GC plugin that lets
you see which of the various sections of your Heap are filling up.

           Cheers,

           Paul.



           On Tue, Dec 13, 2011 at 12:54 AM, Sisu Alexandru
<[hidden email]> wrote:

               Yes, the machine has 128 GB , 48 cores. And till now
we didn't have no problems.
               Anyhow, yesterday I restarted the machine, and
everything got back to normal.
               Still I'm interesting in how to configure the jmx for
es. Any tips? :)
               On Mon, Dec 12, 2011 at 11:43 PM, Shay Banon
<[hidden email]> wrote:

                   You configured ES with 100g? Do you have a machine
with a 100gb of memory? How much memory does your machine has? I am
surprise that it even started.... (swap might be really big).


                   On Mon, Dec 12, 2011 at 11:12 PM, Sisu Alexandru
<[hidden email]> wrote:

                       Well I started up elastic search with Xms and
Xmx set to 100G.
                       That should've been enough.
                       Monitoring though JMX sounds a good ideea. but
how can you configure jmx options in es?
                       The only documentation that I found was here
http://www.elasticsearch.org/guide/reference/modules/jmx.html but its
not very clear to me what should I do.
                       Tnx,
                       Alex


                       On Mon, Dec 12, 2011 at 5:40 PM, Paul Loy
<[hidden email]> wrote:

                           Garbage Collection? How much memory are
you giving each JVM? If it's a large amount and you haven't tuned your
GC options on the JVM, this is a likely cause.

                           I don't suppose you had something
monitoring JMX over that time period. If you did you'd be able to see
if this was the issue if you notice the Heap Space Used dropping off.

                           Paul.


                           On Mon, Dec 12, 2011 at 8:00 AM, Sisu
Alexandru <[hidden email]> wrote:

                               Hello all,
                               Our es suddenly got  blocked for 15
minutes..  That means: it suddenly stopped handling search requests
and also status requests:curl -XGET http://localhost:9200/_status.
                               ES version: 0.18.5
                               The size of the index is arround 200G.
                               One ES client.
                               20 shards. All on one single machine.
                               No mirrors.
                               OS: centos 5.
                               Here is the gist: https://gist.github.com/1467971
                               This info was obtained by jstack.
(jstack -F 27029 (pid of es)). It looks like all the threads got
blocked (all 132 )?!
                               The size of the jstack output is much
more bigger  I've copy pasted only some parts of it.
                               Any ideas?
                               Tnx in advance,
                               Alex




                           --

---------------------------------------------
                           Paul Loy
                           [hidden email]
                           http://uk.linkedin.com/in/paulloy




           --
           ---------------------------------------------
           Paul Loy
           [hidden email]
           http://uk.linkedin.com/in/paulloy


Reply | Threaded
Open this post in threaded view
|

Re: es got blocked

Aurélien-2
Hi.

don't bother taking the thread dumps, it's a clear OOM, TD won't be
really helpful.

I've never use jhat to do OOM analysis, but I don"t know if MAT
http://eclipse.org/mat/ will handle a hprof of 100GB. Worth a try on a
machine with a lot of memory and a well customized eclipse.

Furthermore, if ES works on Java 7, I would try to use it, and
simplify the command line by removing firsts Xms Xmx and all the XX
parameters. It will certainly not solve the OOM, but the new G1 GC
may be more efficient on large  heap size.

I will let other people answer on code specific, I'm not a java dev.

Regards.

On 19 déc, 17:07, Sisu Alexandru <[hidden email]> wrote:

> Hi Aurelien
>
> Tnx for the prompt answer.
> I run es with in foreground so that I can see the eventual OOM. And indeed
> it turned out to be an OOM:
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to java_pid29451.hprof ...
> Exception in thread "elasticsearch[search]-pool-3-thread-18"
> java.lang.OutOfMemoryError: Java heap space
> at
> org.elasticsearch.index.field.data.support.FieldDataLoader.load(FieldDataLoader.java:61)
> at
> org.elasticsearch.index.field.data.strings.StringFieldData.load(StringFieldData.java:84)
> at
> org.elasticsearch.index.field.data.strings.StringFieldDataType.load(StringFieldDataType.java:52)
> at
> org.elasticsearch.index.field.data.strings.StringFieldDataType.load(StringFieldDataType.java:34)
> at org.elasticsearch.index.field.data.FieldData.load(FieldData.java:110)
> at
> org.elasticsearch.index.cache.field.data.support.AbstractConcurrentMapFieldDataCache.cache(AbstractConcurrentMapFieldDataCache.java:119)
> at
> org.elasticsearch.search.facet.terms.strings.TermsStringOrdinalsFacetCollector.doSetNextReader(TermsStringOrdinalsFacetCollector.java:127)
> at
> org.elasticsearch.search.facet.AbstractFacetCollector.setNextReader(AbstractFacetCollector.java:71)
> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:576)
> at
> org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:199)
> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:383)
>
> Command line params:
>
> home/jdk1.6.0_27/bin/java -Xms256m -Xmx1g -Xss128k -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError
> -Delasticsearch -Des.path.home=/home/elasticsearch-0.18.5
> -Des-foreground=yes -cp
> :/home/elasticsearch-0.18.5/lib/*:/home/elasticsearch-0.18.5/lib/sigar/*
> -Xms100G -Xmx100G org.elasticsearch.bootstrap.ElasticSearch
>
> The dumps size of hprof file:
> - currently is 70GB and is increasing (slower). Probably it will reach 100
> gb.
>
> What I did right after this was to run on my development machine the same
> query on a smaller dataset.
> It seems that ResidentFieldDataCache that extends the
> (AbstractConcurrentMapFieldDataCache) doesnt get cleared? (I tried to put a
> breakpoint on clear method , and it never gets called).
> On the other hand, I also didn't configured my index with no caching
> options.
>
> I'm working now on getting the thread dumps and the verbose output of gc.
Reply | Threaded
Open this post in threaded view
|

Re: es got blocked

kimchy
Administrator
In reply to this post by Sisu Alexandru
It seems like you are trying to get terms on a field (body) that has many of those (guessing by the name of it), resulting in the excessive memory usage and OOM. The terms facet is not designed to be used on fields with many terms.

On Mon, Dec 19, 2011 at 4:11 PM, Sisu Alexandru <[hidden email]> wrote:
Hello all,

Following your suggestion I've tried:
- running with bootstrap.mlockall set to True.
- i've enabled the JMX monitoring.

The es continues to hangs. I can reproduce the problem again and again, by executing the following query:
{  "facets": {   "term_count": {      "global": true,      "terms": {        "field": "body",        "size": 100      }    }  },  "size": 0}

(I'm trying to retrieve the most common 100 terms for this field).

To my surprise, I've discovered in the es folder, a set of huge files: java_pid_xxx.hprof. Generated each time my es got 'blocked'.
Now, this files are generated when the process runs out of memory. 

I'm running with 100G of heap memory allocated, and I expect that for large memory consuming operations the memory to be swapped.

Anyhow, it seems that on my machine, each time I'm running the above query, I'm killing it.

Other informations:
- the jhat heap histogram can be found here: https://gist.github.com/1497379
- through JConsole, each time I'm executing this query, I can see how the heap memory increases !very fast! it seems that 100G are consumed in a few seconds.
- regarding the node stats and cluster stats es provides: I cannot access them after es dies, but here is the info of es, right after restart when everything is okey:
os: {
  refresh_interval: 1000
  cpu: {
  vendor: AMD
  model: Opteron
  mhz: 1900
  total_cores: 48
  total_sockets: 4
  cores_per_socket: 12
  cache_size: 512b
  cache_size_in_bytes: 512
  }
  mem: {
  total: 126gb
  total_in_bytes: 135321870336
  }
  swap: {
  total: 64gb
  total_in_bytes: 68803354624
  }
  }
  process: {
  refresh_interval: 1000
  id: 15926
  max_file_descriptors: 128000
  }
  jvm: {
  pid: 15926
  version: 1.6.0_27
  vm_name: Java HotSpot(TM) 64-Bit Server VM
  vm_version: 20.2-b06
  vm_vendor: Sun Microsystems Inc.
  start_time: 1324275713418
  mem: {
  heap_init: 100gb
  heap_init_in_bytes: 107374182400
  heap_max: 99.9gb
  heap_max_in_bytes: 107302223872
  non_heap_init: 23.1mb
  non_heap_init_in_bytes: 24313856
  non_heap_max: 130mb
  non_heap_max_in_bytes: 136314880
  }
  }



Any suggestions?

Tnx,
Alex


On Tue, Dec 13, 2011 at 11:23 PM, Shay Banon <[hidden email]> wrote:
Also, I would add that you make sure to enable mlockall in the configuration to make sure the OS will not swap the elasticsearch process. I never ran ES with 100gb of memory, whats your typical memory usage? (node stats can give you a lot of information, also on the jvm level).


On Tue, Dec 13, 2011 at 12:17 PM, Paul Loy <[hidden email]> wrote:
That's some machine!

Yeah, with stutters like this it's very useful to know what's going on with your heap (and other resources). You can watch the heap via JConsole or use some monitoring tool like traverse to fire emails when the heap gets too big or simply be able to view historical graphs of all your JMX exposed variables.

To enable JMX it looks like you need jmx.create_connector: true in your elasticsearch.yml.

VisualVM also has an awesome Visual GC plugin that lets you see which of the various sections of your Heap are filling up.

Cheers,

Paul.



On Tue, Dec 13, 2011 at 12:54 AM, Sisu Alexandru <[hidden email]> wrote:
Yes, the machine has 128 GB , 48 cores. And till now we didn't have no problems.
Anyhow, yesterday I restarted the machine, and everything got back to normal.
Still I'm interesting in how to configure the jmx for es. Any tips? :)

On Mon, Dec 12, 2011 at 11:43 PM, Shay Banon <[hidden email]> wrote:
You configured ES with 100g? Do you have a machine with a 100gb of memory? How much memory does your machine has? I am surprise that it even started... (swap might be really big).


On Mon, Dec 12, 2011 at 11:12 PM, Sisu Alexandru <[hidden email]> wrote:
Well I started up elastic search with Xms and Xmx set to 100G.
That should've been enough.

Monitoring though JMX sounds a good ideea. but how can you configure jmx options in es?
The only documentation that I found was here http://www.elasticsearch.org/guide/reference/modules/jmx.html but its not very clear to me what should I do.

Tnx,

Alex


On Mon, Dec 12, 2011 at 5:40 PM, Paul Loy <[hidden email]> wrote:
Garbage Collection? How much memory are you giving each JVM? If it's a large amount and you haven't tuned your GC options on the JVM, this is a likely cause.

I don't suppose you had something monitoring JMX over that time period. If you did you'd be able to see if this was the issue if you notice the Heap Space Used dropping off.

Paul.


On Mon, Dec 12, 2011 at 8:00 AM, Sisu Alexandru <[hidden email]> wrote:
Hello all,

Our es suddenly got  blocked for 15 minutes.  That means: it suddenly stopped handling search requests and also status requests:curl -XGET http://localhost:9200/_status.

ES version: 0.18.5
The size of the index is arround 200G.
One ES client.
20 shards. All on one single machine.
No mirrors. 
OS: centos 5.


This info was obtained by jstack. (jstack -F 27029 (pid of es)). It looks like all the threads got blocked (all 132 )?!

The size of the jstack output is much more bigger  I've copy pasted only some parts of it.

Any ideas?

Tnx in advance,

Alex






--
---------------------------------------------
Paul Loy
[hidden email]
http://uk.linkedin.com/in/paulloy






--
---------------------------------------------
Paul Loy
[hidden email]
http://uk.linkedin.com/in/paulloy