Split brain problem in 2 node elasticsearch cluster

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Split brain problem in 2 node elasticsearch cluster

Gourav H Dhelaria

Version: 1.4. 
Say there are 2 nodes X and Y, both capable of becoming master. 
When network goes down, both nodes get disconnected from each other and assume the responsibility of master. 
When network is restored, they don't ping each other and form a cluster.

Elasticsearch service has to be restarted on any one of the nodes for them to form a cluster. Even after they form a cluster, all primary shards remain on one node ( on which the service was restarted ), and all replica shards are on the other node.


This document

http://www.elastic.co/guide/en/elasticsearch/guide/current/_important_configuration_changes.html


mentions that there has to be an uneven number of master eligible nodes.


Queries:


1) Is there a way of avoiding split brain problem in 2 node cluster ?

2) After network is restored, shouldn't the nodes ping each other and form a cluster ?

3) After the service is restarted to form the cluster, why don't the primary shards get distributed on both the nodes ?



Thanks,

Gourav

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d15234b3-0ea1-4390-b136-2f02f69cd3f5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Split brain problem in 2 node elasticsearch cluster

Mark Walkom-2
  1. Why are they becoming split anyway? GC, other load, network?
  2. Not if they both think they are masters.
  3. Are you running replicas? If so ES doesn't really differentiate between the two.

On 4 May 2015 at 15:03, Gourav H Dhelaria <[hidden email]> wrote:

Version: 1.4. 
Say there are 2 nodes X and Y, both capable of becoming master. 
When network goes down, both nodes get disconnected from each other and assume the responsibility of master. 
When network is restored, they don't ping each other and form a cluster.

Elasticsearch service has to be restarted on any one of the nodes for them to form a cluster. Even after they form a cluster, all primary shards remain on one node ( on which the service was restarted ), and all replica shards are on the other node.


This document

http://www.elastic.co/guide/en/elasticsearch/guide/current/_important_configuration_changes.html


mentions that there has to be an uneven number of master eligible nodes.


Queries:


1) Is there a way of avoiding split brain problem in 2 node cluster ?

2) After network is restored, shouldn't the nodes ping each other and form a cluster ?

3) After the service is restarted to form the cluster, why don't the primary shards get distributed on both the nodes ?



Thanks,

Gourav

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d15234b3-0ea1-4390-b136-2f02f69cd3f5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8TkHYhJ_3-YYGBuRoXYF1XxDeRgs9c76tQwA8t-9nO%3DA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Split brain problem in 2 node elasticsearch cluster

Gourav H Dhelaria

1) After network goes down, they loose communication with each other. After that, they are becoming split.
2) They both think they are masters. Even if they think they are masters, shouldn't the ping happen to see if there are other nodes in the cluster ?
3) Number of replicas is set to 1. If ES doesn't differentiate, why are some shards primary and others replica ?


On Monday, 4 May 2015 10:48:24 UTC+5:30, Mark Walkom wrote:
  1. Why are they becoming split anyway? GC, other load, network?
  2. Not if they both think they are masters.
  3. Are you running replicas? If so ES doesn't really differentiate between the two.

On 4 May 2015 at 15:03, Gourav H Dhelaria <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="a6BANniYi1YJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">gouravd...@...> wrote:

Version: 1.4. 
Say there are 2 nodes X and Y, both capable of becoming master. 
When network goes down, both nodes get disconnected from each other and assume the responsibility of master. 
When network is restored, they don't ping each other and form a cluster.

Elasticsearch service has to be restarted on any one of the nodes for them to form a cluster. Even after they form a cluster, all primary shards remain on one node ( on which the service was restarted ), and all replica shards are on the other node.


This document

<a href="http://www.elastic.co/guide/en/elasticsearch/guide/current/_important_configuration_changes.html" target="_blank" rel="nofollow" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.elastic.co%2Fguide%2Fen%2Felasticsearch%2Fguide%2Fcurrent%2F_important_configuration_changes.html\46sa\75D\46sntz\0751\46usg\75AFQjCNGD6btbpUICcmdjlPbH7NszK26_uA';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.elastic.co%2Fguide%2Fen%2Felasticsearch%2Fguide%2Fcurrent%2F_important_configuration_changes.html\46sa\75D\46sntz\0751\46usg\75AFQjCNGD6btbpUICcmdjlPbH7NszK26_uA';return true;">http://www.elastic.co/guide/en/elasticsearch/guide/current/_important_configuration_changes.html


mentions that there has to be an uneven number of master eligible nodes.


Queries:


1) Is there a way of avoiding split brain problem in 2 node cluster ?

2) After network is restored, shouldn't the nodes ping each other and form a cluster ?

3) After the service is restarted to form the cluster, why don't the primary shards get distributed on both the nodes ?



Thanks,

Gourav

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="a6BANniYi1YJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/d15234b3-0ea1-4390-b136-2f02f69cd3f5%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/d15234b3-0ea1-4390-b136-2f02f69cd3f5%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/d15234b3-0ea1-4390-b136-2f02f69cd3f5%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/d15234b3-0ea1-4390-b136-2f02f69cd3f5%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bf7a9953-bc87-4b96-843d-7bff5899855f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Split brain problem in 2 node elasticsearch cluster

Mark Walkom-2
Your nodes aren't in different DCs are they? If so this is why we don't support such setups, because ES is latency sensitive and these sorts of things can happen very easily when your network is unreliable.

They don't try to ping other nodes because you only have two, and if they lose contact with one another then they both assume they are masters and create their own cluster. Masters don't ping other nodes at random and see if they should be joining a different cluster.

Logically there is no difference between a primary and a replica shard, the only physical difference is a flag that tells the cluster state which is which. This is why ES will never assign a primary and it's applicable replica to the same node.


You cannot get around the root of your problem unless you add another node to and set min masters to ensure a majority quorum.

On 4 May 2015 at 15:27, Gourav H Dhelaria <[hidden email]> wrote:

1) After network goes down, they loose communication with each other. After that, they are becoming split.
2) They both think they are masters. Even if they think they are masters, shouldn't the ping happen to see if there are other nodes in the cluster ?
3) Number of replicas is set to 1. If ES doesn't differentiate, why are some shards primary and others replica ?


On Monday, 4 May 2015 10:48:24 UTC+5:30, Mark Walkom wrote:
  1. Why are they becoming split anyway? GC, other load, network?
  2. Not if they both think they are masters.
  3. Are you running replicas? If so ES doesn't really differentiate between the two.

On 4 May 2015 at 15:03, Gourav H Dhelaria <[hidden email]> wrote:

Version: 1.4. 
Say there are 2 nodes X and Y, both capable of becoming master. 
When network goes down, both nodes get disconnected from each other and assume the responsibility of master. 
When network is restored, they don't ping each other and form a cluster.

Elasticsearch service has to be restarted on any one of the nodes for them to form a cluster. Even after they form a cluster, all primary shards remain on one node ( on which the service was restarted ), and all replica shards are on the other node.


This document

http://www.elastic.co/guide/en/elasticsearch/guide/current/_important_configuration_changes.html


mentions that there has to be an uneven number of master eligible nodes.


Queries:


1) Is there a way of avoiding split brain problem in 2 node cluster ?

2) After network is restored, shouldn't the nodes ping each other and form a cluster ?

3) After the service is restarted to form the cluster, why don't the primary shards get distributed on both the nodes ?



Thanks,

Gourav

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d15234b3-0ea1-4390-b136-2f02f69cd3f5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bf7a9953-bc87-4b96-843d-7bff5899855f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8nc1A_vmQ_vyG4fq2uNqFA9kZO%2BT_Y4ed0e6wwPM7ztA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Split brain problem in 2 node elasticsearch cluster

Jason Wee
why must you have only two nodes, would it be possible to add one more nodes so split brain will not become an issue?

jason

On Mon, May 4, 2015 at 2:20 PM, Mark Walkom <[hidden email]> wrote:
Your nodes aren't in different DCs are they? If so this is why we don't support such setups, because ES is latency sensitive and these sorts of things can happen very easily when your network is unreliable.

They don't try to ping other nodes because you only have two, and if they lose contact with one another then they both assume they are masters and create their own cluster. Masters don't ping other nodes at random and see if they should be joining a different cluster.

Logically there is no difference between a primary and a replica shard, the only physical difference is a flag that tells the cluster state which is which. This is why ES will never assign a primary and it's applicable replica to the same node.


You cannot get around the root of your problem unless you add another node to and set min masters to ensure a majority quorum.

On 4 May 2015 at 15:27, Gourav H Dhelaria <[hidden email]> wrote:

1) After network goes down, they loose communication with each other. After that, they are becoming split.
2) They both think they are masters. Even if they think they are masters, shouldn't the ping happen to see if there are other nodes in the cluster ?
3) Number of replicas is set to 1. If ES doesn't differentiate, why are some shards primary and others replica ?


On Monday, 4 May 2015 10:48:24 UTC+5:30, Mark Walkom wrote:
  1. Why are they becoming split anyway? GC, other load, network?
  2. Not if they both think they are masters.
  3. Are you running replicas? If so ES doesn't really differentiate between the two.

On 4 May 2015 at 15:03, Gourav H Dhelaria <[hidden email]> wrote:

Version: 1.4. 
Say there are 2 nodes X and Y, both capable of becoming master. 
When network goes down, both nodes get disconnected from each other and assume the responsibility of master. 
When network is restored, they don't ping each other and form a cluster.

Elasticsearch service has to be restarted on any one of the nodes for them to form a cluster. Even after they form a cluster, all primary shards remain on one node ( on which the service was restarted ), and all replica shards are on the other node.


This document

http://www.elastic.co/guide/en/elasticsearch/guide/current/_important_configuration_changes.html


mentions that there has to be an uneven number of master eligible nodes.


Queries:


1) Is there a way of avoiding split brain problem in 2 node cluster ?

2) After network is restored, shouldn't the nodes ping each other and form a cluster ?

3) After the service is restarted to form the cluster, why don't the primary shards get distributed on both the nodes ?



Thanks,

Gourav

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d15234b3-0ea1-4390-b136-2f02f69cd3f5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bf7a9953-bc87-4b96-843d-7bff5899855f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8nc1A_vmQ_vyG4fq2uNqFA9kZO%2BT_Y4ed0e6wwPM7ztA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHO4itx2SuCKAZHA%2BYjLv4kYGJNN7srx6FVxWJ_UzxzPWZ628w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Split brain problem in 2 node elasticsearch cluster

Gourav H Dhelaria
Looks like the only way around this would be to add more nodes and set minimum masters to ensure a majority quorum.

Thanks.



Gourav

On Monday, 4 May 2015 12:02:27 UTC+5:30, Jason Wee wrote:
why must you have only two nodes, would it be possible to add one more nodes so split brain will not become an issue?

jason

On Mon, May 4, 2015 at 2:20 PM, Mark Walkom <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="qlqVszFRZSMJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">markw...@...> wrote:
Your nodes aren't in different DCs are they? If so this is why we don't support such setups, because ES is latency sensitive and these sorts of things can happen very easily when your network is unreliable.

They don't try to ping other nodes because you only have two, and if they lose contact with one another then they both assume they are masters and create their own cluster. Masters don't ping other nodes at random and see if they should be joining a different cluster.

Logically there is no difference between a primary and a replica shard, the only physical difference is a flag that tells the cluster state which is which. This is why ES will never assign a primary and it's applicable replica to the same node.


You cannot get around the root of your problem unless you add another node to and set min masters to ensure a majority quorum.

On 4 May 2015 at 15:27, Gourav H Dhelaria <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="qlqVszFRZSMJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">gouravd...@...> wrote:

1) After network goes down, they loose communication with each other. After that, they are becoming split.
2) They both think they are masters. Even if they think they are masters, shouldn't the ping happen to see if there are other nodes in the cluster ?
3) Number of replicas is set to 1. If ES doesn't differentiate, why are some shards primary and others replica ?


On Monday, 4 May 2015 10:48:24 UTC+5:30, Mark Walkom wrote:
  1. Why are they becoming split anyway? GC, other load, network?
  2. Not if they both think they are masters.
  3. Are you running replicas? If so ES doesn't really differentiate between the two.

On 4 May 2015 at 15:03, Gourav H Dhelaria <[hidden email]> wrote:

Version: 1.4. 
Say there are 2 nodes X and Y, both capable of becoming master. 
When network goes down, both nodes get disconnected from each other and assume the responsibility of master. 
When network is restored, they don't ping each other and form a cluster.

Elasticsearch service has to be restarted on any one of the nodes for them to form a cluster. Even after they form a cluster, all primary shards remain on one node ( on which the service was restarted ), and all replica shards are on the other node.


This document

<a href="http://www.elastic.co/guide/en/elasticsearch/guide/current/_important_configuration_changes.html" rel="nofollow" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.elastic.co%2Fguide%2Fen%2Felasticsearch%2Fguide%2Fcurrent%2F_important_configuration_changes.html\46sa\75D\46sntz\0751\46usg\75AFQjCNGD6btbpUICcmdjlPbH7NszK26_uA';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.elastic.co%2Fguide%2Fen%2Felasticsearch%2Fguide%2Fcurrent%2F_important_configuration_changes.html\46sa\75D\46sntz\0751\46usg\75AFQjCNGD6btbpUICcmdjlPbH7NszK26_uA';return true;">http://www.elastic.co/guide/en/elasticsearch/guide/current/_important_configuration_changes.html


mentions that there has to be an uneven number of master eligible nodes.


Queries:


1) Is there a way of avoiding split brain problem in 2 node cluster ?

2) After network is restored, shouldn't the nodes ping each other and form a cluster ?

3) After the service is restarted to form the cluster, why don't the primary shards get distributed on both the nodes ?



Thanks,

Gourav

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/d15234b3-0ea1-4390-b136-2f02f69cd3f5%40googlegroups.com?utm_medium=email&amp;utm_source=footer" rel="nofollow" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/d15234b3-0ea1-4390-b136-2f02f69cd3f5%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/d15234b3-0ea1-4390-b136-2f02f69cd3f5%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/d15234b3-0ea1-4390-b136-2f02f69cd3f5%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" rel="nofollow" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="qlqVszFRZSMJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/bf7a9953-bc87-4b96-843d-7bff5899855f%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/bf7a9953-bc87-4b96-843d-7bff5899855f%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/bf7a9953-bc87-4b96-843d-7bff5899855f%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/bf7a9953-bc87-4b96-843d-7bff5899855f%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="qlqVszFRZSMJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8nc1A_vmQ_vyG4fq2uNqFA9kZO%2BT_Y4ed0e6wwPM7ztA%40mail.gmail.com?utm_medium=email&amp;utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8nc1A_vmQ_vyG4fq2uNqFA9kZO%2BT_Y4ed0e6wwPM7ztA%40mail.gmail.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8nc1A_vmQ_vyG4fq2uNqFA9kZO%2BT_Y4ed0e6wwPM7ztA%40mail.gmail.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8nc1A_vmQ_vyG4fq2uNqFA9kZO%2BT_Y4ed0e6wwPM7ztA%40mail.gmail.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ba2982b3-fe59-4d09-8739-43c0384cf901%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Split brain problem in 2 node elasticsearch cluster

Ivan Brusic
In reply to this post by Jason Wee

In non "big data" scenarios, having two servers for a database is simply done to achieve high availability. Most databases use a master client scenario, but Elasticsearch does not support such a setup. It really should because not everyone has tons of data.

Ivan, not affiliated with the OP

On May 4, 2015 8:32 AM, "Jason Wee" <[hidden email]> wrote:
why must you have only two nodes, would it be possible to add one more nodes so split brain will not become an issue?

jason

On Mon, May 4, 2015 at 2:20 PM, Mark Walkom <[hidden email]> wrote:
Your nodes aren't in different DCs are they? If so this is why we don't support such setups, because ES is latency sensitive and these sorts of things can happen very easily when your network is unreliable.

They don't try to ping other nodes because you only have two, and if they lose contact with one another then they both assume they are masters and create their own cluster. Masters don't ping other nodes at random and see if they should be joining a different cluster.

Logically there is no difference between a primary and a replica shard, the only physical difference is a flag that tells the cluster state which is which. This is why ES will never assign a primary and it's applicable replica to the same node.


You cannot get around the root of your problem unless you add another node to and set min masters to ensure a majority quorum.

On 4 May 2015 at 15:27, Gourav H Dhelaria <[hidden email]> wrote:

1) After network goes down, they loose communication with each other. After that, they are becoming split.
2) They both think they are masters. Even if they think they are masters, shouldn't the ping happen to see if there are other nodes in the cluster ?
3) Number of replicas is set to 1. If ES doesn't differentiate, why are some shards primary and others replica ?


On Monday, 4 May 2015 10:48:24 UTC+5:30, Mark Walkom wrote:
  1. Why are they becoming split anyway? GC, other load, network?
  2. Not if they both think they are masters.
  3. Are you running replicas? If so ES doesn't really differentiate between the two.

On 4 May 2015 at 15:03, Gourav H Dhelaria <[hidden email]> wrote:

Version: 1.4. 
Say there are 2 nodes X and Y, both capable of becoming master. 
When network goes down, both nodes get disconnected from each other and assume the responsibility of master. 
When network is restored, they don't ping each other and form a cluster.

Elasticsearch service has to be restarted on any one of the nodes for them to form a cluster. Even after they form a cluster, all primary shards remain on one node ( on which the service was restarted ), and all replica shards are on the other node.


This document

http://www.elastic.co/guide/en/elasticsearch/guide/current/_important_configuration_changes.html


mentions that there has to be an uneven number of master eligible nodes.


Queries:


1) Is there a way of avoiding split brain problem in 2 node cluster ?

2) After network is restored, shouldn't the nodes ping each other and form a cluster ?

3) After the service is restarted to form the cluster, why don't the primary shards get distributed on both the nodes ?



Thanks,

Gourav

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d15234b3-0ea1-4390-b136-2f02f69cd3f5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bf7a9953-bc87-4b96-843d-7bff5899855f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8nc1A_vmQ_vyG4fq2uNqFA9kZO%2BT_Y4ed0e6wwPM7ztA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHO4itx2SuCKAZHA%2BYjLv4kYGJNN7srx6FVxWJ_UzxzPWZ628w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBXXO0h5Sa7LKaSirzvVnjoxHdu%3D3v4Pq0b_MfUto%2BAbg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.