cross data center replication

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

cross data center replication

Saikat Kanjilal
Hello Folks,
I digged through the documentation and found a few wiki posts in places but nothing that seems to answer this question directly, as of the latest release of does ES out of the box currently support cross data center replication,  I've seen a post or two regarding use cases where folks are running ES on top of a key value store that supports this like Couchbase but nothing to indicate that ES itself has support for this.  Some insight or links to docs regarding this would be very helpful.

Thanks in advance.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: cross data center replication

Daniel Maher-3
On 2013-04-23 5:22 PM, Saikat Kanjilal wrote:
> Hello Folks,
> I digged through the documentation and found a few wiki posts in places
> but nothing that seems to answer this question directly, as of the
> latest release of does ES out of the box currently support cross data
> center replication,  I've seen a post or two regarding use cases where
> folks are running ES on top of a key value store that supports this like
> Couchbase but nothing to indicate that ES itself has support for this.
> Some insight or links to docs regarding this would be very helpful.

Hello,

I'd wager that the question you're really asking about is how to control
where shards are placed; if you can make deterministic statements about
where shards are, then you can create your own "rack-aware" or "data
centre-aware" scenarios.  ES has supported this "out of the box" for
well over a year now (possibly longer).

You'll want to investigate "zones" and "routing allocation", which are
the key elements of shard placement.  There is an excellent blog post
which describes exactly how to set things up here :
http://blog.sematext.com/2012/05/29/elasticsearch-shard-placement-control/

Enjoy !

--
dan (phrawzty).
mozilla webops; european outpost.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


Reply | Threaded
Open this post in threaded view
|

Re: cross data center replication

Norberto Meijome
Along the same line, once you have your zones and shards allocated across different DCs, is it possible to have queries originating on DC #1 to only stay in DC #1 ? ie, how can we control the ES nodes from distributing the queries across all nodes. Alternatively, is there a way to tell the ES cluster about 'shard distance' (so that queries are optimised where shard distance is minimised ) ? 

Thanks!!
Beto


On Wed, Apr 24, 2013 at 1:44 AM, Daniel Maher <[hidden email]> wrote:
On 2013-04-23 5:22 PM, Saikat Kanjilal wrote:
Hello Folks,
I digged through the documentation and found a few wiki posts in places
but nothing that seems to answer this question directly, as of the
latest release of does ES out of the box currently support cross data
center replication,  I've seen a post or two regarding use cases where
folks are running ES on top of a key value store that supports this like
Couchbase but nothing to indicate that ES itself has support for this.
Some insight or links to docs regarding this would be very helpful.

Hello,

I'd wager that the question you're really asking about is how to control where shards are placed; if you can make deterministic statements about where shards are, then you can create your own "rack-aware" or "data centre-aware" scenarios.  ES has supported this "out of the box" for well over a year now (possibly longer).

You'll want to investigate "zones" and "routing allocation", which are the key elements of shard placement.  There is an excellent blog post which describes exactly how to set things up here :
http://blog.sematext.com/2012/05/29/elasticsearch-shard-placement-control/

Enjoy !

--
dan (phrawzty).
mozilla webops; european outpost.


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.





--
Norberto 'Beto' Meijome

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: cross data center replication

Radu Gheorghe-2
Hello Beto,

I didn't use this feature (yet), but you have some options you can specify at query time for shard preference:

Best regards,
Radu
-- 
http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Wed, Apr 24, 2013 at 1:37 PM, Norberto Meijome <[hidden email]> wrote:
Along the same line, once you have your zones and shards allocated across different DCs, is it possible to have queries originating on DC #1 to only stay in DC #1 ? ie, how can we control the ES nodes from distributing the queries across all nodes. Alternatively, is there a way to tell the ES cluster about 'shard distance' (so that queries are optimised where shard distance is minimised ) ? 

Thanks!!
Beto


On Wed, Apr 24, 2013 at 1:44 AM, Daniel Maher <[hidden email]> wrote:
On 2013-04-23 5:22 PM, Saikat Kanjilal wrote:
Hello Folks,
I digged through the documentation and found a few wiki posts in places
but nothing that seems to answer this question directly, as of the
latest release of does ES out of the box currently support cross data
center replication,  I've seen a post or two regarding use cases where
folks are running ES on top of a key value store that supports this like
Couchbase but nothing to indicate that ES itself has support for this.
Some insight or links to docs regarding this would be very helpful.

Hello,

I'd wager that the question you're really asking about is how to control where shards are placed; if you can make deterministic statements about where shards are, then you can create your own "rack-aware" or "data centre-aware" scenarios.  ES has supported this "out of the box" for well over a year now (possibly longer).

You'll want to investigate "zones" and "routing allocation", which are the key elements of shard placement.  There is an excellent blog post which describes exactly how to set things up here :
http://blog.sematext.com/2012/05/29/elasticsearch-shard-placement-control/

Enjoy !

--
dan (phrawzty).
mozilla webops; european outpost.


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.





--
Norberto 'Beto' Meijome

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: cross data center replication

Norberto Meijome
thanks guys, yeah, i had noticed those options - it seems I'll need an external service to synchronise the state of the ES cluster to the app ( ZK  ) ... given there is only an option to specify node by node_id, rather than by a property of the node ( where it is located,for example).

does sound like an useful thing to have, imo - in case it isn't obvious, I'm running ES on AWS.

cheers,
Beto


On Wed, Apr 24, 2013 at 9:26 PM, Radu Gheorghe <[hidden email]> wrote:
Hello Beto,

I didn't use this feature (yet), but you have some options you can specify at query time for shard preference:

Best regards,
Radu
-- 
http://sematext.com/ -- ElasticSearch -- Solr -- Lucene


On Wed, Apr 24, 2013 at 1:37 PM, Norberto Meijome <[hidden email]> wrote:
Along the same line, once you have your zones and shards allocated across different DCs, is it possible to have queries originating on DC #1 to only stay in DC #1 ? ie, how can we control the ES nodes from distributing the queries across all nodes. Alternatively, is there a way to tell the ES cluster about 'shard distance' (so that queries are optimised where shard distance is minimised ) ? 

Thanks!!
Beto


On Wed, Apr 24, 2013 at 1:44 AM, Daniel Maher <[hidden email]> wrote:
On 2013-04-23 5:22 PM, Saikat Kanjilal wrote:
Hello Folks,
I digged through the documentation and found a few wiki posts in places
but nothing that seems to answer this question directly, as of the
latest release of does ES out of the box currently support cross data
center replication,  I've seen a post or two regarding use cases where
folks are running ES on top of a key value store that supports this like
Couchbase but nothing to indicate that ES itself has support for this.
Some insight or links to docs regarding this would be very helpful.

Hello,

I'd wager that the question you're really asking about is how to control where shards are placed; if you can make deterministic statements about where shards are, then you can create your own "rack-aware" or "data centre-aware" scenarios.  ES has supported this "out of the box" for well over a year now (possibly longer).

You'll want to investigate "zones" and "routing allocation", which are the key elements of shard placement.  There is an excellent blog post which describes exactly how to set things up here :
http://blog.sematext.com/2012/05/29/elasticsearch-shard-placement-control/

Enjoy !

--
dan (phrawzty).
mozilla webops; european outpost.


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.





--
Norberto 'Beto' Meijome

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
Norberto 'Beto' Meijome

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: cross data center replication

phill
In reply to this post by Daniel Maher-3
On 4/23/2013 8:44 AM, Daniel Maher wrote:

> On 2013-04-23 5:22 PM, Saikat Kanjilal wrote:
>> Hello Folks,
>> [...] does ES out of the box currently support cross data
>> center replication,  [....]
>
> Hello,
>
> I'd wager that the question you're really asking about is how to
> control where shards are placed; if you can make deterministic
> statements about where shards are, then you can create your own
> "rack-aware" or "data centre-aware" scenarios.  ES has supported this
> "out of the box" for well over a year now (possibly longer).
>
> You'll want to investigate "zones" and "routing allocation", which are
> the key elements of shard placement.  There is an excellent blog post
> which describes exactly how to set things up here :
> http://blog.sematext.com/2012/05/29/elasticsearch-shard-placement-control/ 
>
>
Is shard allocation really the correct solution if the data centers are
globally distributed?

If I have a data center in the US intended to server data from the US,
but it should also have access to Europe and Asia data, and clusters in
both Europe and Asia with similar needs, would I really want to use
zones etc. and have one great global cluster with data center aware
configurations?

Assuming that the US would be happy to deal with old documents from Asia
and Europe, when Asia or Europe is off line or just not caught up, it
would seem that you would NOT want a "world" cluster, because I can't
picture how you'd configure a 3-part world cluster for both index into
the right indices, search the right (possible combination of) shards,
but also preventing "split brain".

In the scenerio, I've described, I would think each data center might
better provide availability and eventual consistency (with less concern
for the remote data from the other region) by having three clusters and
some type of syncing from one index to copies at the other two
locations.  For example, the US datacenter might have a US,
copyOfEurope, and copyOfAsia index.

Anyone have any observations about such a world-wide scenerio?
Are there any index to index copy utilities?
Is there a river or other plugin that might be useful for this three
clusters working together scenerio?
How about the project https://github.com/karussell/elasticsearch-reindex?
Comments?

-Paul

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


Reply | Threaded
Open this post in threaded view
|

Re: cross data center replication

Norberto Meijome
+1 on all of the above. es-reindex already in my list of things to investigate (for a number of issues...)

cheers,


On Wed, May 1, 2013 at 6:58 AM, Paul Hill <[hidden email]> wrote:
On 4/23/2013 8:44 AM, Daniel Maher wrote:
On 2013-04-23 5:22 PM, Saikat Kanjilal wrote:
Hello Folks,
[...] does ES out of the box currently support cross data
center replication,  [....]

Hello,

I'd wager that the question you're really asking about is how to control where shards are placed; if you can make deterministic statements about where shards are, then you can create your own "rack-aware" or "data centre-aware" scenarios.  ES has supported this "out of the box" for well over a year now (possibly longer).

You'll want to investigate "zones" and "routing allocation", which are the key elements of shard placement.  There is an excellent blog post which describes exactly how to set things up here :
http://blog.sematext.com/2012/05/29/elasticsearch-shard-placement-control/

Is shard allocation really the correct solution if the data centers are globally distributed?

If I have a data center in the US intended to server data from the US, but it should also have access to Europe and Asia data, and clusters in both Europe and Asia with similar needs, would I really want to use zones etc. and have one great global cluster with data center aware configurations?

Assuming that the US would be happy to deal with old documents from Asia and Europe, when Asia or Europe is off line or just not caught up, it would seem that you would NOT want a "world" cluster, because I can't picture how you'd configure a 3-part world cluster for both index into the right indices, search the right (possible combination of) shards, but also preventing "split brain".

In the scenerio, I've described, I would think each data center might better provide availability and eventual consistency (with less concern for the remote data from the other region) by having three clusters and some type of syncing from one index to copies at the other two locations.  For example, the US datacenter might have a US, copyOfEurope, and copyOfAsia index.

Anyone have any observations about such a world-wide scenerio?
Are there any index to index copy utilities?
Is there a river or other plugin that might be useful for this three clusters working together scenerio?
How about the project https://github.com/karussell/elasticsearch-reindex?
Comments?

-Paul


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.





--
Norberto 'Beto' Meijome

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: cross data center replication

Todd Nine
Hey all,
 
Sorry to resurrect a dead thread.  Did you ever find a solution for eventual consistency of documents across EC2 regions?

Thanks,
todd



On Wednesday, May 1, 2013 5:50:00 AM UTC-7, Norberto Meijome wrote:
+1 on all of the above. es-reindex already in my list of things to investigate (for a number of issues...)

cheers,


On Wed, May 1, 2013 at 6:58 AM, Paul Hill <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="T5dTC6Usr7IJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">pare...@...> wrote:
On 4/23/2013 8:44 AM, Daniel Maher wrote:
On 2013-04-23 5:22 PM, Saikat Kanjilal wrote:
Hello Folks,
[...] does ES out of the box currently support cross data
center replication,  [....]

Hello,

I'd wager that the question you're really asking about is how to control where shards are placed; if you can make deterministic statements about where shards are, then you can create your own "rack-aware" or "data centre-aware" scenarios.  ES has supported this "out of the box" for well over a year now (possibly longer).

You'll want to investigate "zones" and "routing allocation", which are the key elements of shard placement.  There is an excellent blog post which describes exactly how to set things up here :
<a href="http://blog.sematext.com/2012/05/29/elasticsearch-shard-placement-control/" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fblog.sematext.com%2F2012%2F05%2F29%2Felasticsearch-shard-placement-control%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNHWxK4nZcR3otalrKZMDBzq5zLRZQ';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fblog.sematext.com%2F2012%2F05%2F29%2Felasticsearch-shard-placement-control%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNHWxK4nZcR3otalrKZMDBzq5zLRZQ';return true;">http://blog.sematext.com/2012/05/29/elasticsearch-shard-placement-control/

Is shard allocation really the correct solution if the data centers are globally distributed?

If I have a data center in the US intended to server data from the US, but it should also have access to Europe and Asia data, and clusters in both Europe and Asia with similar needs, would I really want to use zones etc. and have one great global cluster with data center aware configurations?

Assuming that the US would be happy to deal with old documents from Asia and Europe, when Asia or Europe is off line or just not caught up, it would seem that you would NOT want a "world" cluster, because I can't picture how you'd configure a 3-part world cluster for both index into the right indices, search the right (possible combination of) shards, but also preventing "split brain".

In the scenerio, I've described, I would think each data center might better provide availability and eventual consistency (with less concern for the remote data from the other region) by having three clusters and some type of syncing from one index to copies at the other two locations.  For example, the US datacenter might have a US, copyOfEurope, and copyOfAsia index.

Anyone have any observations about such a world-wide scenerio?
Are there any index to index copy utilities?
Is there a river or other plugin that might be useful for this three clusters working together scenerio?
How about the project <a href="https://github.com/karussell/elasticsearch-reindex" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Fkarussell%2Felasticsearch-reindex\46sa\75D\46sntz\0751\46usg\75AFQjCNFQt-NTLgOGb1i54Pv0urZx0W11oA';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Fkarussell%2Felasticsearch-reindex\46sa\75D\46sntz\0751\46usg\75AFQjCNFQt-NTLgOGb1i54Pv0urZx0W11oA';return true;">https://github.com/karussell/elasticsearch-reindex?
Comments?

-Paul


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="T5dTC6Usr7IJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
For more options, visit <a href="https://groups.google.com/groups/opt_out" target="_blank" onmousedown="this.href='https://groups.google.com/groups/opt_out';return true;" onclick="this.href='https://groups.google.com/groups/opt_out';return true;">https://groups.google.com/groups/opt_out.





--
Norberto 'Beto' Meijome

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/646067d1-1137-4777-be51-ced0bd6a3edd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: cross data center replication

MatthewParrott
I'm interested in this too.
es-reindex seems like it lacks conflict resolution, and as noted in the docs, would be better implemented as a river.

On Wednesday, June 4, 2014 9:03:37 PM UTC-7, Todd Nine wrote:
Hey all,
 
Sorry to resurrect a dead thread.  Did you ever find a solution for eventual consistency of documents across EC2 regions?

Thanks,
todd



On Wednesday, May 1, 2013 5:50:00 AM UTC-7, Norberto Meijome wrote:
+1 on all of the above. es-reindex already in my list of things to investigate (for a number of issues...)

cheers,


On Wed, May 1, 2013 at 6:58 AM, Paul Hill <[hidden email]> wrote:
On 4/23/2013 8:44 AM, Daniel Maher wrote:
On 2013-04-23 5:22 PM, Saikat Kanjilal wrote:
Hello Folks,
[...] does ES out of the box currently support cross data
center replication,  [....]

Hello,

I'd wager that the question you're really asking about is how to control where shards are placed; if you can make deterministic statements about where shards are, then you can create your own "rack-aware" or "data centre-aware" scenarios.  ES has supported this "out of the box" for well over a year now (possibly longer).

You'll want to investigate "zones" and "routing allocation", which are the key elements of shard placement.  There is an excellent blog post which describes exactly how to set things up here :
<a href="http://blog.sematext.com/2012/05/29/elasticsearch-shard-placement-control/" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fblog.sematext.com%2F2012%2F05%2F29%2Felasticsearch-shard-placement-control%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNHWxK4nZcR3otalrKZMDBzq5zLRZQ';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fblog.sematext.com%2F2012%2F05%2F29%2Felasticsearch-shard-placement-control%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNHWxK4nZcR3otalrKZMDBzq5zLRZQ';return true;">http://blog.sematext.com/2012/05/29/elasticsearch-shard-placement-control/

Is shard allocation really the correct solution if the data centers are globally distributed?

If I have a data center in the US intended to server data from the US, but it should also have access to Europe and Asia data, and clusters in both Europe and Asia with similar needs, would I really want to use zones etc. and have one great global cluster with data center aware configurations?

Assuming that the US would be happy to deal with old documents from Asia and Europe, when Asia or Europe is off line or just not caught up, it would seem that you would NOT want a "world" cluster, because I can't picture how you'd configure a 3-part world cluster for both index into the right indices, search the right (possible combination of) shards, but also preventing "split brain".

In the scenerio, I've described, I would think each data center might better provide availability and eventual consistency (with less concern for the remote data from the other region) by having three clusters and some type of syncing from one index to copies at the other two locations.  For example, the US datacenter might have a US, copyOfEurope, and copyOfAsia index.

Anyone have any observations about such a world-wide scenerio?
Are there any index to index copy utilities?
Is there a river or other plugin that might be useful for this three clusters working together scenerio?
How about the project <a href="https://github.com/karussell/elasticsearch-reindex" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Fkarussell%2Felasticsearch-reindex\46sa\75D\46sntz\0751\46usg\75AFQjCNFQt-NTLgOGb1i54Pv0urZx0W11oA';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Fkarussell%2Felasticsearch-reindex\46sa\75D\46sntz\0751\46usg\75AFQjCNFQt-NTLgOGb1i54Pv0urZx0W11oA';return true;">https://github.com/karussell/elasticsearch-reindex?
Comments?

-Paul


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit <a href="https://groups.google.com/groups/opt_out" target="_blank" onmousedown="this.href='https://groups.google.com/groups/opt_out';return true;" onclick="this.href='https://groups.google.com/groups/opt_out';return true;">https://groups.google.com/groups/opt_out.





--
Norberto 'Beto' Meijome

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/86f03167-6803-4bdd-9278-21b222e56d7c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.