Unexpected cluster behavior

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Unexpected cluster behavior

Andrej
Hi all,

we noticed a very strange behavior of our cluster last night (16
nodes, all running 0.19.4). At a certain time one cluster node
obviously restarted (at least I would interpret the log message
"[2012-05-29 23:33:44,090][INFO ][node                     ] [Star-
Lord] {0.19.4}[23335]: starting ... " this way). Within the next half
hour another 5 nodes restarted, some of them several (up to 4) times.
In the logs we couldnt find anything like stopping or stopped, what is
the usual output when stopping a node.

My questions:
- is it possible to start a node that wasnt stopped before?
- what can actually start an elasticsearch node? At this time we didnt
use the index, so no requests were sent from our side. Can
elasticsearch itself restart nodes and if so, what triggers this?
- finally, after restarting no indices were found. Any ideas on this
one?

Shay, if you are interested in more information I can send you all
logs. The beer in Berlin is on me ;-)

Thanks!
Andrej
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected cluster behavior

kimchy
Administrator
Can you share the logs, that would be helpful?

When a node shuts down properly (i.e., sending a kill command (without -9)), or using the shutdown API, it will log the fact that its stopping. Obviously, nodes can't start by themselves unless they are being run by another demon. Are you using something like that? The service wrapper maybe?

On Wed, May 30, 2012 at 2:53 PM, Andrej Rosenheinrich <[hidden email]> wrote:
Hi all,

we noticed a very strange behavior of our cluster last night (16
nodes, all running 0.19.4). At a certain time one cluster node
obviously restarted (at least I would interpret the log message
"[2012-05-29 23:33:44,090][INFO ][node                     ] [Star-
Lord] {0.19.4}[23335]: starting ...     " this way). Within the next half
hour another 5 nodes restarted, some of them several (up to 4) times.
In the logs we couldnt find anything like stopping or stopped, what is
the usual output when stopping a node.

My questions:
- is it possible to start a node that wasnt stopped before?
- what can actually start an elasticsearch node? At this time we didnt
use the index, so no requests were sent from our side. Can
elasticsearch itself restart nodes and if so, what triggers this?
- finally, after restarting no indices were found. Any ideas on this
one?

Shay, if you are interested in more information I can send you all
logs. The beer in Berlin is on me ;-)

Thanks!
Andrej

Reply | Threaded
Open this post in threaded view
|

Re: Unexpected cluster behavior

Andrej
Shay, your hint was absolutely the right one, reading the logs from
the service wrapper revealed that the wrapper decided to restart the
JVM because it did not responded the expected way. So at least we have
an explanation, I can sleep well again ;-)

It was nice to talk to you in Berlin, thanks for your informations and
your interesting talk. Cant wait to try out 0.19.5 ;-)

Greets,
Andrej