Zen, Split Brain and the Art of Master Election

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Zen, Split Brain and the Art of Master Election

Paul Smith
I had originally thought I had seen some information about how the Zen discovery protocol deals with the Split Brain problem, but alas now I can't find it, or it wasn't there and I was dreaming it up in a caffeine fuelled haze.

I spotted a recent pull request/discussion about Zookeeper (Issue #1057).  I know Zookeeper is designed around that quorum process and, as I understand it (which is saying, I probably don't), has defences against Split Brain.  

Given the catastrophe of a split brain scenario, where rogue splinter cells of the cluster go off on their own, practically guaranteeing data loss/corruption, is there anything specific one should know about Zen and Split Brain?  

It's obviously not an easy problem to solve, I'd just like to be 100% clear to my team if this does/does not address (or perhaps partially addresses) this evil scenario.  Obviously Zookeeper is a much larger beast to be pulling in to ES, an Zen is simple and fairly elegant, Shay, what are your thoughts on this matter in a practical term?


Reply | Threaded
Open this post in threaded view
|

Re: Zen, Split Brain and the Art of Master Election

kimchy
Administrator
Heya,

  Actually, I did some work to improve on that in master: https://github.com/elasticsearch/elasticsearch/issues/1079. This solves a lot of the cases of split brain, and makes zen more usable for this.

-shay.banon

On Thursday, July 7, 2011 at 1:25 AM, Paul Smith wrote:

I had originally thought I had seen some information about how the Zen discovery protocol deals with the Split Brain problem, but alas now I can't find it, or it wasn't there and I was dreaming it up in a caffeine fuelled haze.

I spotted a recent pull request/discussion about Zookeeper (Issue #1057).  I know Zookeeper is designed around that quorum process and, as I understand it (which is saying, I probably don't), has defences against Split Brain.  

Given the catastrophe of a split brain scenario, where rogue splinter cells of the cluster go off on their own, practically guaranteeing data loss/corruption, is there anything specific one should know about Zen and Split Brain?  

It's obviously not an easy problem to solve, I'd just like to be 100% clear to my team if this does/does not address (or perhaps partially addresses) this evil scenario.  Obviously Zookeeper is a much larger beast to be pulling in to ES, an Zen is simple and fairly elegant, Shay, what are your thoughts on this matter in a practical term?



Reply | Threaded
Open this post in threaded view
|

Re: Zen, Split Brain and the Art of Master Election

Karussell
I read about this split brain problem recently. Why is it "practically
guaranteeing data loss/corruption"? Could someone point me to a
document where this is explained?

I only found wikipedia and this one:

http://unixworld2010.wordpress.com/2011/01/10/how-vcs-avoids-split-brain/

"When this happens, systems on both sides of the partition can restart
applications from the other side resulting in duplicate services, or
'split-brain'"

Why can't a healthy service on a node prevent itself from beeing
restarted or duplicated?

Regards,
Peter.

On 7 Jul., 00:30, Shay Banon <[hidden email]> wrote:

> Heya,
>
> Actually, I did some work to improve on that in master:https://github.com/elasticsearch/elasticsearch/issues/1079. This solves a lot of the cases of split brain, and makes zen more usable for this.
>
> -shay.banon
>
> On Thursday, July 7, 2011 at 1:25 AM, Paul Smith wrote:
> > I had originally thought I had seen some information about how the Zen discovery protocol deals with the Split Brain problem, but alas now I can't find it, or it wasn't there and I was dreaming it up in a caffeine fuelled haze.
>
> > I spotted a recent pull request/discussion about Zookeeper (Issue #1057). I know Zookeeper is designed around that quorum process and, as I understand it (which is saying, I probably don't), has defences against Split Brain.
>
> > Given the catastrophe of a split brain scenario, where rogue splinter cells of the cluster go off on their own, practically guaranteeing data loss/corruption, is there anything specific one should know about Zen and Split Brain?
>
> > It's obviously not an easy problem to solve, I'd just like to be 100% clear to my team if this does/does not address (or perhaps partially addresses) this evil scenario. Obviously Zookeeper is a much larger beast to be pulling in to ES, an Zen is simple and fairly elegant, Shay, what are your thoughts on this matter in a practical term?
Reply | Threaded
Open this post in threaded view
|

Re: Zen, Split Brain and the Art of Master Election

Paul Smith
In reply to this post by kimchy


On 7 July 2011 08:30, Shay Banon <[hidden email]> wrote:
Heya,

  Actually, I did some work to improve on that in master: https://github.com/elasticsearch/elasticsearch/issues/1079. This solves a lot of the cases of split brain, and makes zen more usable for this.

-shay.banon



Thanks Shay, that looks to cover many of the cases I can think of.  My boss has a background in this from work in SGI's filesystem group (CXFS),  with a LOT of burn marks from large clusters and real world split brain issues, so he's just being diligent in the asking..  If it ever happened, it would be a world of hurt to clean up (if reindexing took a bloody long time for example)..
Reply | Threaded
Open this post in threaded view
|

Re: Zen, Split Brain and the Art of Master Election

Paul Smith
In reply to this post by Karussell
In a truly pathological case, where 2 halves of a cluster don't know about each other, both sides elect themselves a master (unless, say, this Zen extension prevents one side from doing that because it realises there's not enough quorum on it's side).

You now have 2 clusters, who both think they're valid.  With 2 masters, both masters would be trying to re-replicating shards and work towards green health, and shuffling things around. This might not be SO bad (but it probably is) but if you have clients of these 2 clusters, perhaps split between the 2 evil clusters, then, say, half the updates are going to one, half the other.

You now have 2 clusters, neither of which have the true index state.  I'm pretty sure the only recourse then, after addressing the connectivity issues (fix the split brain) is to blow away the index and reindex, because you can't trust either half.

Having an Index Verifier process (something we're working on) could end up being quicker to repair a dodgy cluster than a full reindex,  after something like this, but once you're in split brain, data integrity is pretty much lost as soon as you have a single update.

That's how I read it anyway.

Paul

On 7 July 2011 09:03, Karussell <[hidden email]> wrote:
I read about this split brain problem recently. Why is it "practically
guaranteeing data loss/corruption"? Could someone point me to a
document where this is explained?

I only found wikipedia and this one:

http://unixworld2010.wordpress.com/2011/01/10/how-vcs-avoids-split-brain/

"When this happens, systems on both sides of the partition can restart
applications from the other side resulting in duplicate services, or
'split-brain'"

Why can't a healthy service on a node prevent itself from beeing
restarted or duplicated?

Regards,
Peter.

On 7 Jul., 00:30, Shay Banon <[hidden email]> wrote:
> Heya,
>
> Actually, I did some work to improve on that in master:https://github.com/elasticsearch/elasticsearch/issues/1079. This solves a lot of the cases of split brain, and makes zen more usable for this.
>
> -shay.banon
>
> On Thursday, July 7, 2011 at 1:25 AM, Paul Smith wrote:
> > I had originally thought I had seen some information about how the Zen discovery protocol deals with the Split Brain problem, but alas now I can't find it, or it wasn't there and I was dreaming it up in a caffeine fuelled haze.
>
> > I spotted a recent pull request/discussion about Zookeeper (Issue #1057). I know Zookeeper is designed around that quorum process and, as I understand it (which is saying, I probably don't), has defences against Split Brain.
>
> > Given the catastrophe of a split brain scenario, where rogue splinter cells of the cluster go off on their own, practically guaranteeing data loss/corruption, is there anything specific one should know about Zen and Split Brain?
>
> > It's obviously not an easy problem to solve, I'd just like to be 100% clear to my team if this does/does not address (or perhaps partially addresses) this evil scenario. Obviously Zookeeper is a much larger beast to be pulling in to ES, an Zen is simple and fairly elegant, Shay, what are your thoughts on this matter in a practical term?

Reply | Threaded
Open this post in threaded view
|

Re: Zen, Split Brain and the Art of Master Election

kimchy
Administrator
Paul explained well what is a split brain and what the problem with it is. As he also noted, the new improvement to the zen discovery will mean that nodes that don't see "enough" (and its up to you to define what enough is) nodes int the cluster, will disconnect and try and join the cluster again. They will only join once they see enough master eligible nodes.

On Thursday, July 7, 2011 at 2:25 AM, Paul Smith wrote:

In a truly pathological case, where 2 halves of a cluster don't know about each other, both sides elect themselves a master (unless, say, this Zen extension prevents one side from doing that because it realises there's not enough quorum on it's side).

You now have 2 clusters, who both think they're valid.  With 2 masters, both masters would be trying to re-replicating shards and work towards green health, and shuffling things around. This might not be SO bad (but it probably is) but if you have clients of these 2 clusters, perhaps split between the 2 evil clusters, then, say, half the updates are going to one, half the other.

You now have 2 clusters, neither of which have the true index state.  I'm pretty sure the only recourse then, after addressing the connectivity issues (fix the split brain) is to blow away the index and reindex, because you can't trust either half.

Having an Index Verifier process (something we're working on) could end up being quicker to repair a dodgy cluster than a full reindex,  after something like this, but once you're in split brain, data integrity is pretty much lost as soon as you have a single update.

That's how I read it anyway.

Paul

On 7 July 2011 09:03, Karussell <[hidden email]> wrote:
I read about this split brain problem recently. Why is it "practically
guaranteeing data loss/corruption"? Could someone point me to a
document where this is explained?

I only found wikipedia and this one:

http://unixworld2010.wordpress.com/2011/01/10/how-vcs-avoids-split-brain/

"When this happens, systems on both sides of the partition can restart
applications from the other side resulting in duplicate services, or
'split-brain'"

Why can't a healthy service on a node prevent itself from beeing
restarted or duplicated?

Regards,
Peter.

On 7 Jul., 00:30, Shay Banon <[hidden email]> wrote:
> Heya,
>
> Actually, I did some work to improve on that in master:https://github.com/elasticsearch/elasticsearch/issues/1079. This solves a lot of the cases of split brain, and makes zen more usable for this.
>
> -shay.banon
>
> On Thursday, July 7, 2011 at 1:25 AM, Paul Smith wrote:
> > I had originally thought I had seen some information about how the Zen discovery protocol deals with the Split Brain problem, but alas now I can't find it, or it wasn't there and I was dreaming it up in a caffeine fuelled haze.
>
> > I spotted a recent pull request/discussion about Zookeeper (Issue #1057). I know Zookeeper is designed around that quorum process and, as I understand it (which is saying, I probably don't), has defences against Split Brain.
>
> > Given the catastrophe of a split brain scenario, where rogue splinter cells of the cluster go off on their own, practically guaranteeing data loss/corruption, is there anything specific one should know about Zen and Split Brain?
>
> > It's obviously not an easy problem to solve, I'd just like to be 100% clear to my team if this does/does not address (or perhaps partially addresses) this evil scenario. Obviously Zookeeper is a much larger beast to be pulling in to ES, an Zen is simple and fairly elegant, Shay, what are your thoughts on this matter in a practical term?


Reply | Threaded
Open this post in threaded view
|

Re: Zen, Split Brain and the Art of Master Election

Karussell
Thanks!

On 7 Jul., 16:28, Shay Banon <[hidden email]> wrote:

> Paul explained well what is a split brain and what the problem with it is. As he also noted, the new improvement to the zen discovery will mean that nodes that don't see "enough" (and its up to you to define what enough is) nodes int the cluster, will disconnect and try and join the cluster again. They will only join once they see enough master eligible nodes.
>
> On Thursday, July 7, 2011 at 2:25 AM, Paul Smith wrote:
> > In a truly pathological case, where 2 halves of a cluster don't know about each other, both sides elect themselves a master (unless, say, this Zen extension prevents one side from doing that because it realises there's not enough quorum on it's side).
>
> > You now have 2 clusters, who both think they're valid. With 2 masters, both masters would be trying to re-replicating shards and work towards green health, and shuffling things around. This might not be SO bad (but it probably is) but if you have clients of these 2 clusters, perhaps split between the 2 evil clusters, then, say, half the updates are going to one, half the other.
>
> > You now have 2 clusters, neither of which have the true index state. I'm pretty sure the only recourse then, after addressing the connectivity issues (fix the split brain) is to blow away the index and reindex, because you can't trust either half.
>
> > Having an Index Verifier process (something we're working on) could end up being quicker to repair a dodgy cluster than a full reindex, after something like this, but once you're in split brain, data integrity is pretty much lost as soon as you have a single update.
>
> > That's how I read it anyway.
>
> > Paul
>
> > On 7 July 2011 09:03, Karussell <[hidden email] (mailto:[hidden email])> wrote:
> > > I read about this split brain problem recently. Why is it "practically
> > >  guaranteeing data loss/corruption"? Could someone point me to a
> > >  document where this is explained?
>
> > >  I only found wikipedia and this one:
>
> > >http://unixworld2010.wordpress.com/2011/01/10/how-vcs-avoids-split-br...
>
> > >  "When this happens, systems on both sides of the partition can restart
> > >  applications from the other side resulting in duplicate services, or
> > >  'split-brain'"
>
> > >  Why can't a healthy service on a node prevent itself from beeing
> > >  restarted or duplicated?
>
> > >  Regards,
> > >  Peter.
>
> > >  On 7 Jul., 00:30, Shay Banon <[hidden email] (mailto:[hidden email])> wrote:
> > > > Heya,
>
> > > > Actually, I did some work to improve on that in master:https://github.com/elasticsearch/elasticsearch/issues/1079. This solves a lot of the cases of split brain, and makes zen more usable for this.
>
> > > > -shay.banon
>
> > > > On Thursday, July 7, 2011 at 1:25 AM, Paul Smith wrote:
> > > > > I had originally thought I had seen some information about how the Zen discovery protocol deals with the Split Brain problem, but alas now I can't find it, or it wasn't there and I was dreaming it up in a caffeine fuelled haze.
>
> > > > > I spotted a recent pull request/discussion about Zookeeper (Issue #1057). I know Zookeeper is designed around that quorum process and, as I understand it (which is saying, I probably don't), has defences against Split Brain.
>
> > > > > Given the catastrophe of a split brain scenario, where rogue splinter cells of the cluster go off on their own, practically guaranteeing data loss/corruption, is there anything specific one should know about Zen and Split Brain?
>
> > > > > It's obviously not an easy problem to solve, I'd just like to be 100% clear to my team if this does/does not address (or perhaps partially addresses) this evil scenario. Obviously Zookeeper is a much larger beast to be pulling in to ES, an Zen is simple and fairly elegant, Shay, what are your thoughts on this matter in a practical term?