Cross-DC clusters - specific dangers

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Cross-DC clusters - specific dangers

AndrewK
I am aware that cross data-center clusters are not recommended, since they violate one of the core assumptions of ES, namely that all nodes are equal. But what *specifically* (apart from obvious problems associated with network failure) can this lead too: is it just high or "irregular" latency and the difficulty in debugging issues when node-reponse times are unequal, or can more critical issues such as split brain also arise from this?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a8fe93fb-dfd1-414a-86d7-2ebad66107a4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Cross-DC clusters - specific dangers

joergprante@gmail.com
Split-Brain risk is not related to latency, it can happen on any network which is dynamic.

The main issue is latency, yes. This is a killer. If latency is too high, real-time systems can be seen as unusable from a user perspective.

Second issue is network bandwith. LAN traffic is a magnitude faster than WAN traffic.

Another issue is timing. ES does not have vector clocks yet. That means, a node is not aware of a local time and a global time, instead, the cluster assumes all nodes share the same clock. As a consequence, the ES code for indexing and search is relatively easy to maintain (it is assumed the causality rule "I write first, then you can read next" is never broken). In a distributed system, this rule is no longer 100% true when reads and writes are intertwined and forwarded to other nodes, and missing coordination can lead to all kind of strange effects (hangs, missing data, wrong data, conflicts, just to name a few). These effects are expected to be more frequent on a cluster that spans DCs. I expect ES 2.0 will be a step forward, it seems it will introduce sequence numbers for operations, and probably a distributed clock.

With snapshot/restore, data can already be transported between two ES clusters in two DCs, By paying the price of lagging behind a leading cluster, another cluster can be set up as a follower cluster quite easily, keeping latency low, and working around the timing challenge.

Jörg

On Tue, Apr 14, 2015 at 8:53 AM, AndrewK <[hidden email]> wrote:
I am aware that cross data-center clusters are not recommended, since they violate one of the core assumptions of ES, namely that all nodes are equal. But what *specifically* (apart from obvious problems associated with network failure) can this lead too: is it just high or "irregular" latency and the difficulty in debugging issues when node-reponse times are unequal, or can more critical issues such as split brain also arise from this?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a8fe93fb-dfd1-414a-86d7-2ebad66107a4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHCEF62XewV-DCihdnMwTvPpWV1zAWiBqP_m6jFoVyU_g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.