Multiple masters elected during cluster crash - question about data consistency

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Multiple masters elected during cluster crash - question about data consistency

Marek Skorek
Hi,

I met on OOME on all the three nodes in my cluster.

After restarting each node from the system shell (concurrently) node 1 was started as the first one and was elected as a master one. Right after that the nodes 2 and 3 was started too but they could not see the 1st node had been started yet and the 2nd node was elected as master also. 
Then I saw that my single, three node cluster was splitted into two instances (one with a single node and the second one with to 2 left nodes). They started to work independently and two rivers was started concurrently on both "clusters".

The question is:

If two rivers are working concurrently is there any chance that after fixing the situation and "merge" broken clusters into a single one all the data will be available/indexed ? Or my river need to take care of the data consistency?

Thanks for all of your advices :-)

Regards,
Marek.

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Multiple masters elected during cluster crash - question about data consistency

Igor Motov-3
It's possible that after "merge" you will end up with two shards that supposed to have the same, but contain different sets of data because they were parts of different clusters. It's somewhat of difficult situation to recover from. The best thing you can do in this case, is to remove one of the shards (by temporary setting number of replicas to 0, for example) and then reindex missing records. 

If you haven't done this already, I would recommend setting discovery.zen.minimum_master_nodes to 2 (more than a half of master-eligible nodes in your cluster)  in order to prevent such situation from happening in the future. 


On Wednesday, November 28, 2012 3:07:48 AM UTC-5, scoro wrote:
Hi,

I met on OOME on all the three nodes in my cluster.

After restarting each node from the system shell (concurrently) node 1 was started as the first one and was elected as a master one. Right after that the nodes 2 and 3 was started too but they could not see the 1st node had been started yet and the 2nd node was elected as master also. 
Then I saw that my single, three node cluster was splitted into two instances (one with a single node and the second one with to 2 left nodes). They started to work independently and two rivers was started concurrently on both "clusters".

The question is:

If two rivers are working concurrently is there any chance that after fixing the situation and "merge" broken clusters into a single one all the data will be available/indexed ? Or my river need to take care of the data consistency?

Thanks for all of your advices :-)

Regards,
Marek.

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Multiple masters elected during cluster crash - question about data consistency

Marek Skorek
Thank you Igor. That is the answer I was expected :-)

W dniu czwartek, 29 listopada 2012 02:50:05 UTC+1 użytkownik Igor Motov napisał:
It's possible that after "merge" you will end up with two shards that supposed to have the same, but contain different sets of data because they were parts of different clusters. It's somewhat of difficult situation to recover from. The best thing you can do in this case, is to remove one of the shards (by temporary setting number of replicas to 0, for example) and then reindex missing records. 

If you haven't done this already, I would recommend setting discovery.zen.minimum_master_nodes to 2 (more than a half of master-eligible nodes in your cluster)  in order to prevent such situation from happening in the future. 


On Wednesday, November 28, 2012 3:07:48 AM UTC-5, scoro wrote:
Hi,

I met on OOME on all the three nodes in my cluster.

After restarting each node from the system shell (concurrently) node 1 was started as the first one and was elected as a master one. Right after that the nodes 2 and 3 was started too but they could not see the 1st node had been started yet and the 2nd node was elected as master also. 
Then I saw that my single, three node cluster was splitted into two instances (one with a single node and the second one with to 2 left nodes). They started to work independently and two rivers was started concurrently on both "clusters".

The question is:

If two rivers are working concurrently is there any chance that after fixing the situation and "merge" broken clusters into a single one all the data will be available/indexed ? Or my river need to take care of the data consistency?

Thanks for all of your advices :-)

Regards,
Marek.

--