interesting CPU load (without actually traffic load).

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

interesting CPU load (without actually traffic load).

jacque74
Hello, every ones in a while we get into a state where one of our
servers reports high USER and System CPU, as indicated on this graph:

http://bit.ly/yAQyPe

As you can tell, the rest of the cluster is pretty much idle, while
img699 is continuously hot with CPU

top - 14:28:46 up 737 days, 18:21,  1 user,  load average: 12.03,
10.37, 10.25
Tasks: 125 total,   1 running, 124 sleeping,   0 stopped,   0 zombie
Cpu(s): 34.9%us, 33.9%sy,  0.0%ni, 18.8%id, 11.7%wa,  0.2%hi,
0.5%si,  0.0%st
Mem:  16472372k total, 16387968k used,    84404k free,     5952k
buffers
Swap:  9775544k total,     5632k used,  9769912k free,  6111504k
cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
COMMAND
 8625 root      20   0 9113m 8.4g  10m S 513.6 53.6   9021:31 java

Please see attached Jstack:

http://pastebin.com/HFUAn6ra

I am not really sure what its doing, all of the health status
indicators are idle, there is no merge or flush in progress, and left
alone, this server will be hot for days.  The only way to resolve this
is to restart the process.

Shay, please  let me know what you think.

-Jack

Reply | Threaded
Open this post in threaded view
|

Re: interesting CPU load (without actually traffic load).

kimchy
Administrator
Thanks for the stack trace. From what I can see, there are several on going stats requests on that node, and I can see when this might happen if a shard is being closed or was closed while they were executing (I can't see from the stack trace if thats the case). I fixed it here: https://github.com/elasticsearch/elasticsearch/issues/1772. Otherwise, the only other thing that I can think is that the networking lib is causing it (there were some bugs related to that in older versions, though they have been fixed and I its not evident that tis happening from the stack trace).

On Friday, March 9, 2012 at 12:31 AM, Jack Levin wrote:

Hello, every ones in a while we get into a state where one of our
servers reports high USER and System CPU, as indicated on this graph:


As you can tell, the rest of the cluster is pretty much idle, while
img699 is continuously hot with CPU

top - 14:28:46 up 737 days, 18:21, 1 user, load average: 12.03,
10.37, 10.25
Tasks: 125 total, 1 running, 124 sleeping, 0 stopped, 0 zombie
Cpu(s): 34.9%us, 33.9%sy, 0.0%ni, 18.8%id, 11.7%wa, 0.2%hi,
0.5%si, 0.0%st
Mem: 16472372k total, 16387968k used, 84404k free, 5952k
buffers
Swap: 9775544k total, 5632k used, 9769912k free, 6111504k
cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
8625 root 20 0 9113m 8.4g 10m S 513.6 53.6 9021:31 java

Please see attached Jstack:


I am not really sure what its doing, all of the health status
indicators are idle, there is no merge or flush in progress, and left
alone, this server will be hot for days. The only way to resolve this
is to restart the process.

Shay, please let me know what you think.

-Jack

Reply | Threaded
Open this post in threaded view
|

Re: interesting CPU load (without actually traffic load).

jacque74
Shay, is there a stable version of ES I can get with this Fix?
Otherwise where can I get it?

-jack

On Mar 8, 4:04 pm, Shay Banon <[hidden email]> wrote:

> Thanks for the stack trace. From what I can see, there are several on going stats requests on that node, and I can see when this might happen if a shard is being closed or was closed while they were executing (I can't see from the stack trace if thats the case). I fixed it here:https://github.com/elasticsearch/elasticsearch/issues/1772. Otherwise, the only other thing that I can think is that the networking lib is causing it (there were some bugs related to that in older versions, though they have been fixed and I its not evident that tis happening from the stack trace).
>
>
>
>
>
>
>
> On Friday, March 9, 2012 at 12:31 AM, Jack Levin wrote:
> > Hello, every ones in a while we get into a state where one of our
> > servers reports high USER and System CPU, as indicated on this graph:
>
> >http://bit.ly/yAQyPe
>
> > As you can tell, the rest of the cluster is pretty much idle, while
> > img699 is continuously hot with CPU
>
> > top - 14:28:46 up 737 days, 18:21, 1 user, load average: 12.03,
> > 10.37, 10.25
> > Tasks: 125 total, 1 running, 124 sleeping, 0 stopped, 0 zombie
> > Cpu(s): 34.9%us, 33.9%sy, 0.0%ni, 18.8%id, 11.7%wa, 0.2%hi,
> > 0.5%si, 0.0%st
> > Mem: 16472372k total, 16387968k used, 84404k free, 5952k
> > buffers
> > Swap: 9775544k total, 5632k used, 9769912k free, 6111504k
> > cached
>
> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
> > COMMAND
> > 8625 root 20 0 9113m 8.4g 10m S 513.6 53.6 9021:31 java
>
> > Please see attached Jstack:
>
> >http://pastebin.com/HFUAn6ra
>
> > I am not really sure what its doing, all of the health status
> > indicators are idle, there is no merge or flush in progress, and left
> > alone, this server will be hot for days. The only way to resolve this
> > is to restart the process.
>
> > Shay, please let me know what you think.
>
> > -Jack
Reply | Threaded
Open this post in threaded view
|

Re: interesting CPU load (without actually traffic load).

kimchy
Administrator
The fix has been applied to both master and 0.19 branch. It will be part of 0.19.1 release (released either this week or the next). You can easily build a version yourself to test it, just clone / download the 0.19 branch from github, and run "mvn package -DskipTests", the distribution files will be under target/release.

On Saturday, March 10, 2012 at 4:37 AM, Jack Levin wrote:

Shay, is there a stable version of ES I can get with this Fix?
Otherwise where can I get it?

-jack

On Mar 8, 4:04 pm, Shay Banon <kim...@gmail.com> wrote:
Thanks for the stack trace. From what I can see, there are several on going stats requests on that node, and I can see when this might happen if a shard is being closed or was closed while they were executing (I can't see from the stack trace if thats the case). I fixed it here:https://github.com/elasticsearch/elasticsearch/issues/1772. Otherwise, the only other thing that I can think is that the networking lib is causing it (there were some bugs related to that in older versions, though they have been fixed and I its not evident that tis happening from the stack trace).







On Friday, March 9, 2012 at 12:31 AM, Jack Levin wrote:
Hello, every ones in a while we get into a state where one of our
servers reports high USER and System CPU, as indicated on this graph:


As you can tell, the rest of the cluster is pretty much idle, while
img699 is continuously hot with CPU

top - 14:28:46 up 737 days, 18:21, 1 user, load average: 12.03,
10.37, 10.25
Tasks: 125 total, 1 running, 124 sleeping, 0 stopped, 0 zombie
Cpu(s): 34.9%us, 33.9%sy, 0.0%ni, 18.8%id, 11.7%wa, 0.2%hi,
0.5%si, 0.0%st
Mem: 16472372k total, 16387968k used, 84404k free, 5952k
buffers
Swap: 9775544k total, 5632k used, 9769912k free, 6111504k
cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
8625 root 20 0 9113m 8.4g 10m S 513.6 53.6 9021:31 java

Please see attached Jstack:


I am not really sure what its doing, all of the health status
indicators are idle, there is no merge or flush in progress, and left
alone, this server will be hot for days. The only way to resolve this
is to restart the process.

Shay, please let me know what you think.

-Jack