what's happend? my es cluster? plz help me.

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

what's happend? my es cluster? plz help me.

hongsgo
This post was updated on .
my cluster is consist of 3 instance  ip name 15~17
today in the morning. 17 instance was left the cluster
in the 15 instance elasticsearch-head plugin 17 instance stats is "Unassigned" 16 is can not find.
what's happend?
please somebody help me

1. 17 instance log message.. in below..

[2014-04-20 03:29:28,539][INFO ][discovery.zen   ] [10.32.240.17] master_left [[10.32.240.16] [YL2_5dVaTQ-_3Rvm1yKzoA] [net [/10.32.240.16:21001]]], reason [failed to ping, tried [3] times, each with  maximum [30s] timeout]
[2014-04-20 03:29:28,540][INFO ][cluster.service          ] [10.32.240.17] master {new [10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]], previous [10.32.240.16][YL2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]]}, removed {[10.32.240.16][Y
L2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]],}, reason: zen-disco-master_failed ([10.32.240.16][YL2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]])
[2014-04-20 03:30:01,320][DEBUG][action.admin.cluster.node.stats] [10.32.240.17] failed to execute on node [a0qNnjLvQSauGEddNxKmNw]
org.elasticsearch.index.engine.EngineClosedException: [jp_listened_calcu_log][0] CurrentState[CLOSED]

2. 15. instance log message
[2014-04-20 03:27:18,747][INFO ][discovery.zen            ] [10.32.240.15] master_left [[10.32.240.16][YL2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]]], reason [failed to ping, tried [3] times, each with  maximum [30s] timeout]
[2014-04-20 03:27:18,757][INFO ][cluster.service          ] [10.32.240.15] master {new [10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]], previous [10.32.240.16][YL2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]]}, removed {[10.32.240.16][Y
L2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]],}, reason: zen-disco-master_failed ([10.32.240.16][YL2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]])
[2014-04-20 03:28:28,544][WARN ][transport                ] [10.32.240.15] Received response for a request that has timed out, sent [68787ms] ago, timed out [38787ms] ago, action [discovery/zen/fd/masterPing], node [[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][i
net[/10.32.240.17:21001]]], id [10310608]
[2014-04-20 03:28:28,544][WARN ][transport                ] [10.32.240.15] Received response for a request that has timed out, sent [38787ms] ago, timed out [8787ms] ago, action [discovery/zen/fd/masterPing], node [[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][in
et[/10.32.240.17:21001]]], id [10310609]
[2014-04-20 03:28:28,552][INFO ][discovery.zen            ] [10.32.240.15] master_left [[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]], reason [no longer master]
[2014-04-20 03:28:28,557][INFO ][cluster.service          ] [10.32.240.15] master {new [10.32.240.15][dE_q8O-dT-SeUlTBuM-yiQ][inet[/10.32.240.15:21001]], previous [10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]}, removed {[10.32.240.17][a
0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]],}, reason: zen-disco-master_failed ([10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]])
[2014-04-20 03:29:28,546][WARN ][discovery.zen            ] [10.32.240.15] received cluster state from [[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]] which is also master but with an older cluster_state, telling [[10.32.240.17][a0qNnjL
vQSauGEddNxKmNw][inet[/10.32.240.17:21001]]] to rejoin the cluster
[2014-04-20 03:29:28,548][WARN ][discovery.zen            ] [10.32.240.15] failed to send rejoin request to [[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]]
org.elasticsearch.transport.SendRequestTransportException: [10.32.240.17][inet[/10.32.240.17:21001]][discovery/zen/rejoin]
        at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202)
        at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173)
        at org.elasticsearch.discovery.zen.ZenDiscovery$7.execute(ZenDiscovery.java:541)
        at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:298)
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:135)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: org.elasticsearch.transport.NodeNotConnectedException: [10.32.240.17][inet[/10.32.240.17:21001]] Node not connected
        at org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:834)
        at org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:532)
        at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189)
        ... 7 more
[2014-04-20 03:29:28,603][WARN ][discovery.zen            ] [10.32.240.15] received cluster state from [[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]] which is also master but with an older cluster_state, telling [[10.32.240.17][a0qNnjL
vQSauGEddNxKmNw][inet[/10.32.240.17:21001]]] to rejoin the cluster
[2014-04-20 03:29:28,604][WARN ][discovery.zen            ] [10.32.240.15] failed to send rejoin request to [[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]]
org.elasticsearch.transport.SendRequestTransportException: [10.32.240.17][inet[/10.32.240.17:21001]][discovery/zen/rejoin]
        at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202)
        at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173)
        at org.elasticsearch.discovery.zen.ZenDiscovery$7.execute(ZenDiscovery.java:541)
        at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:298)
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:135)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: org.elasticsearch.transport.NodeNotConnectedException: [10.32.240.17][inet[/10.32.240.17:21001]] Node not connected
        at org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:834)
        at org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:532)
        at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189)
        ... 7 more
~                                                                                                                        

3. 17 instance elasticsearch process is alive

 /usr/bin/java -Xms2G -Xmx2G -Xss256k -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch -Des.path.home=/home/irteam/apps/elasticsearch-0.90.7 -cp :/home/irteam/apps/elasticsearch-0.90.7/lib/elasticsearch-0.90.7.jar:/home/irteam/apps/elasticsearch-0.90.7/lib/*:/home/irteam/apps/elasticsearch-0.90.7/lib/sigar/* org.elasticsearch.bootstrap.ElasticSearch

4. configuration
cluster.name: music-es-beta
node.name: 10.32.240.15
http.port: 21200
transport.tcp.port: 21001
multicast.enabled: false
index.number_of_shards: 3
index.number_of_replicas: 1
index.mapper.dynamic: false
action.auto_create_index: false
bootstrap.mlockall: true
discovery.zen.ping.timeout: 10s
index.cache.field.type: soft
discovery.zen.ping.unicast.hosts: ["10.32.240.15", "10.32.240.16","10.32.240.17"]

5. how can i consist es-cluster? for fail-over and fail-back
Reply | Threaded
Open this post in threaded view
|

Re: what's happend? my es-es cluster? plz help me.

Binh Ly-2
Could be something network related. From the logs, it looks like 16 dropped out and then 17 and 15 decided that 17 is the new master. If you have not added more data since, you can restart 16 and see if it joins back to the cluster. Regardless, you probably want to set discovery.zen.minimum_master_nodes: 2 for all your 3 nodes to ensure that if a node drops out, it will not form a cluster by itself and continue to accept requests.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/209b2d44-04dd-4c18-bec7-8b2b14b046dc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: what's happend? my es-es cluster? plz help me.

Mark Walkom
In reply to this post by hongsgo
It looks like you lost connectivity between nodes, this may be due to GC.
Shutdown all your ndoes and then add this into your config - discovery.zen.minimum_master_nodes: 2. Then restart your cluster one node at a time.

Are you using anything like ElasticHQ, kopf or marvel to monitor things?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [hidden email]
web: www.campaignmonitor.com


On 20 April 2014 17:37, hongsgo <[hidden email]> wrote:
my cluster is consist of 3 instance  ip name 15~17
today in the morning. 17 instance was left the cluster
in the 15 instance elasticsearch-head plugin 17 instance stats is
"Unassigned" 16 is can not find.
what's happend?
please somebody help me

1. 17 instance log message.. in below..

[2014-04-20 03:29:28,539][INFO ][discovery.zen   ] [10.32.240.17]
master_left [[10.32.240.16] [YL2_5dVaTQ-_3Rvm1yKzoA] [net
[/10.32.240.16:21001]]], reason [failed to ping, tried [3] times, each with
maximum [30s] timeout]
[2014-04-20 03:29:28,540][INFO ][cluster.service          ] [10.32.240.17]
master {new
[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]], previous
[10.32.240.16][YL2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]]}, removed
{[10.32.240.16][Y
L2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]],}, reason:
zen-disco-master_failed
([10.32.240.16][YL2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]])
[2014-04-20 03:30:01,320][DEBUG][action.admin.cluster.node.stats]
[10.32.240.17] failed to execute on node [a0qNnjLvQSauGEddNxKmNw]
org.elasticsearch.index.engine.EngineClosedException:
[jp_listened_calcu_log][0] CurrentState[CLOSED]

2. 15. instance log message
[2014-04-20 03:27:18,747][INFO ][discovery.zen            ] [10.32.240.15]
master_left
[[10.32.240.16][YL2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]]], reason
[failed to ping, tried [3] times, each with  maximum [30s] timeout]
[2014-04-20 03:27:18,757][INFO ][cluster.service          ] [10.32.240.15]
master {new
[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]], previous
[10.32.240.16][YL2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]]}, removed
{[10.32.240.16][Y
L2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]],}, reason:
zen-disco-master_failed
([10.32.240.16][YL2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]])
[2014-04-20 03:28:28,544][WARN ][transport                ] [10.32.240.15]
Received response for a request that has timed out, sent [68787ms] ago,
timed out [38787ms] ago, action [discovery/zen/fd/masterPing], node
[[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][i
net[/10.32.240.17:21001]]], id [10310608]
[2014-04-20 03:28:28,544][WARN ][transport                ] [10.32.240.15]
Received response for a request that has timed out, sent [38787ms] ago,
timed out [8787ms] ago, action [discovery/zen/fd/masterPing], node
[[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][in
et[/10.32.240.17:21001]]], id [10310609]
[2014-04-20 03:28:28,552][INFO ][discovery.zen            ] [10.32.240.15]
master_left
[[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]], reason
[no longer master]
[2014-04-20 03:28:28,557][INFO ][cluster.service          ] [10.32.240.15]
master {new
[10.32.240.15][dE_q8O-dT-SeUlTBuM-yiQ][inet[/10.32.240.15:21001]], previous
[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]}, removed
{[10.32.240.17][a
0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]],}, reason:
zen-disco-master_failed
([10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]])
[2014-04-20 03:29:28,546][WARN ][discovery.zen            ] [10.32.240.15]
received cluster state from
[[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]] which is
also master but with an older cluster_state, telling [[10.32.240.17][a0qNnjL
vQSauGEddNxKmNw][inet[/10.32.240.17:21001]]] to rejoin the cluster
[2014-04-20 03:29:28,548][WARN ][discovery.zen            ] [10.32.240.15]
failed to send rejoin request to
[[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]]
org.elasticsearch.transport.SendRequestTransportException:
[10.32.240.17][inet[/10.32.240.17:21001]][discovery/zen/rejoin]
        at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202)
        at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173)
        at
org.elasticsearch.discovery.zen.ZenDiscovery$7.execute(ZenDiscovery.java:541)
        at
org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:298)
        at
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:135)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: org.elasticsearch.transport.NodeNotConnectedException:
[10.32.240.17][inet[/10.32.240.17:21001]] Node not connected
        at
org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:834)
        at
org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:532)
        at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189)
        ... 7 more
[2014-04-20 03:29:28,603][WARN ][discovery.zen            ] [10.32.240.15]
received cluster state from
[[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]] which is
also master but with an older cluster_state, telling [[10.32.240.17][a0qNnjL
vQSauGEddNxKmNw][inet[/10.32.240.17:21001]]] to rejoin the cluster
[2014-04-20 03:29:28,604][WARN ][discovery.zen            ] [10.32.240.15]
failed to send rejoin request to
[[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]]
org.elasticsearch.transport.SendRequestTransportException:
[10.32.240.17][inet[/10.32.240.17:21001]][discovery/zen/rejoin]
        at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202)
        at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173)
        at
org.elasticsearch.discovery.zen.ZenDiscovery$7.execute(ZenDiscovery.java:541)
        at
org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:298)
        at
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:135)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: org.elasticsearch.transport.NodeNotConnectedException:
[10.32.240.17][inet[/10.32.240.17:21001]] Node not connected
        at
org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:834)
        at
org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:532)
        at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189)
        ... 7 more
~

3. 17 instance elasticsearch process is alive

 /usr/bin/java -Xms2G -Xmx2G -Xss256k -Djava.awt.headless=true
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
-XX:+HeapDumpOnOutOfMemoryError -Delasticsearch
-Des.path.home=/home/irteam/apps/elasticsearch-0.90.7 -cp
:/home/irteam/apps/elasticsearch-0.90.7/lib/elasticsearch-0.90.7.jar:/home/irteam/apps/elasticsearch-0.90.7/lib/*:/home/irteam/apps/elasticsearch-0.90.7/lib/sigar/*
org.elasticsearch.bootstrap.ElasticSearch

4. configuration
cluster.name: music-es-beta
node.name: 10.32.240.15
http.port: 21200
transport.tcp.port: 21001
multicast.enabled: false
index.number_of_shards: 3
index.number_of_replicas: 1
index.mapper.dynamic: false
action.auto_create_index: false
bootstrap.mlockall: true
discovery.zen.ping.timeout: 10s
index.cache.field.type: soft
discovery.zen.ping.unicast.hosts: ["10.32.240.15",
"10.32.240.16","10.32.240.17"]




--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/what-s-happend-my-es-es-cluster-plz-help-me-tp4054448.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1397979426164-4054448.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624bXCnTTp_us3NPeeixWg2Un95%3DZCyQ%2BJ1oUziYLuiqvbA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: what's happend? my es-es cluster? plz help me.

hongsgo
This post was updated on .
In reply to this post by Binh Ly-2
thank you very much.

i have new questions.

first what is default value of discovery.zen.minimum_master_nodes?

second, about isolated 16 node.
client is reuqest to save  at  "abc index"  of 16 node(not joined to cluster yet)
succeed to save.

after restart 16 node. then joined cluster.
abc index data is ok?
if not duplicate doc id. it would be ok?
after mix with 16, 17, 18 nodes

Reply | Threaded
Open this post in threaded view
|

Re: what's happend? my es-es cluster? plz help me.

Ivan Brusic
There is no default value for minimum_master_nodes. If not set, the value is not used to determine if the cluster is whole.

If the documents do not have a duplicate, they should be merged when the node rejoins the cluster. If you set the minimum_master_nodes, the cluster will not accept any document inserts if the cluster is red. The cluster will be red if only one node is present (in order to prevent split brain).

Cheers,

Ivan


On Mon, Apr 28, 2014 at 12:40 AM, hongsgo <[hidden email]> wrote:
thanks you very much.

i have new questions.

first what is default value of discovery.zen.minimum_master_nodes?

second, about isolated 16 node.
client is reuqest to save  at  "abc index"  of 16 node(not joined to cluster
yet)
succeed to save.

after restart 16 node. then joined cluster.
abc index data is ok?
if not duplicate doc id. it would be ok?
after mix with 16, 17, 18 nodes





--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/what-s-happend-my-es-cluster-plz-help-me-tp4054448p4054890.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1398670831625-4054890.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQA4L%3DCiXNKQn6d_dQjZGoHc5JYSVaBkSojZfFPhZw7xNg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.