Elasticsearch replication protocol between datacenters?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Elasticsearch replication protocol between datacenters?

joergprante@gmail.com
Hi,

how about Bittorrent, could it be a feasible protocol for future
Elasticsearch replication between datacenters? Is the idea good or
bad? Pros and cons? Any comments welcome.

Just stumbled upon this post where Bittorrent protocol is used to
overcome solr index replication deficiencies (but within a datacenter
I presume)

http://codeascraft.etsy.com/2012/01/23/solr-bittorrent-index-replication/

Jörg
Reply | Threaded
Open this post in threaded view
|

Re: Elasticsearch replication protocol between datacenters?

dadoonet
That's a very interesting article.
They decrease the replication time from 60 mn to 6 mn using bittorrent instead of http protocol !
That's really significant.

David

Le 28 janv. 2012 à 10:26, jprante <[hidden email]> a écrit :

> Hi,
>
> how about Bittorrent, could it be a feasible protocol for future
> Elasticsearch replication between datacenters? Is the idea good or
> bad? Pros and cons? Any comments welcome.
>
> Just stumbled upon this post where Bittorrent protocol is used to
> overcome solr index replication deficiencies (but within a datacenter
> I presume)
>
> http://codeascraft.etsy.com/2012/01/23/solr-bittorrent-index-replication/
>
> Jörg
Reply | Threaded
Open this post in threaded view
|

Re: Elasticsearch replication protocol between datacenters?

Paul Loy
We use BitTorrent for our deploys. It has reduced deployment time to 10% for us also.

But I think this is where Solr vs ElasticSearch becomes interesting. Solr, if I am correct, is Master-slave. That is one index is replicated but not sharded. ES is sharded and replicated so individual replicas of individual shards are much smaller than the entire index - well 1/shards times smaller.

Second, I think ES pushes changes to replicas as deltas already rather than needing BitTorrent to hash and tell us what has changed.

So in the general usage pattern I don't think it'll help out that much. But what about failure cases? Perhaps when you spin up a new node, that's when a BitTorrent protocol could make you some savings in ES?


Paul.




On Sat, Jan 28, 2012 at 2:07 AM, David Pilato <[hidden email]> wrote:
That's a very interesting article.
They decrease the replication time from 60 mn to 6 mn using bittorrent instead of http protocol !
That's really significant.

David

Le 28 janv. 2012 à 10:26, jprante <[hidden email]> a écrit :

> Hi,
>
> how about Bittorrent, could it be a feasible protocol for future
> Elasticsearch replication between datacenters? Is the idea good or
> bad? Pros and cons? Any comments welcome.
>
> Just stumbled upon this post where Bittorrent protocol is used to
> overcome solr index replication deficiencies (but within a datacenter
> I presume)
>
> http://codeascraft.etsy.com/2012/01/23/solr-bittorrent-index-replication/
>
> Jörg



--
---------------------------------------------
Paul Loy
[hidden email]
http://uk.linkedin.com/in/paulloy
Reply | Threaded
Open this post in threaded view
|

Re: Elasticsearch replication protocol between datacenters?

kimchy
Administrator
Replication in elasticsearch is different than current Solr replication mode, it does not replica internal index segments, where bit torrent can make some sense, it replicates operations.

On Sunday, January 29, 2012 at 11:32 AM, Paul Loy wrote:

We use BitTorrent for our deploys. It has reduced deployment time to 10% for us also.

But I think this is where Solr vs ElasticSearch becomes interesting. Solr, if I am correct, is Master-slave. That is one index is replicated but not sharded. ES is sharded and replicated so individual replicas of individual shards are much smaller than the entire index - well 1/shards times smaller.

Second, I think ES pushes changes to replicas as deltas already rather than needing BitTorrent to hash and tell us what has changed.

So in the general usage pattern I don't think it'll help out that much. But what about failure cases? Perhaps when you spin up a new node, that's when a BitTorrent protocol could make you some savings in ES?


Paul.




On Sat, Jan 28, 2012 at 2:07 AM, David Pilato <[hidden email]> wrote:
That's a very interesting article.
They decrease the replication time from 60 mn to 6 mn using bittorrent instead of http protocol !
That's really significant.

David

Le 28 janv. 2012 à 10:26, jprante <[hidden email]> a écrit :

> Hi,
>
> how about Bittorrent, could it be a feasible protocol for future
> Elasticsearch replication between datacenters? Is the idea good or
> bad? Pros and cons? Any comments welcome.
>
> Just stumbled upon this post where Bittorrent protocol is used to
> overcome solr index replication deficiencies (but within a datacenter
> I presume)
>
> http://codeascraft.etsy.com/2012/01/23/solr-bittorrent-index-replication/
>
> Jörg



--
---------------------------------------------
Paul Loy
[hidden email]
http://uk.linkedin.com/in/paulloy