How to setup an ES-to-ES river?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

How to setup an ES-to-ES river?

es_learner
Is it supported?

The objective here is to 'tee' new docs into a secondary index.  My current implementation is to write twice from the client - once to primary index and the other to secondary.  Primary index gets pruned every month.  Secondary is never pruned.
Reply | Threaded
Open this post in threaded view
|

Re: How to setup an ES-to-ES river?

dadoonet
See here: https://github.com/elasticsearch/elasticsearch/issues/1077

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 28 sept. 2012 à 02:49, es_learner <[hidden email]> a écrit :

Is it supported?

The objective here is to 'tee' new docs into a secondary index.  My current
implementation is to write twice from the client - once to primary index and
the other to secondary.  Primary index gets pruned every month.  Secondary
is never pruned.



--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/How-to-setup-an-ES-to-ES-river-tp4023286.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--


--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: How to setup an ES-to-ES river?

James Boehmer
I'm actually looking for something very similar, which I do not believe is the same as that _source river request.  I need to run two Elasticsearch stacks separately but simultaneously, to segregate internal traffic from external traffic.  With Solr I would set up a single master, and run two sets of slaves load balanced independently.  That way the internal slaves could never be affected by traffic hitting the external slaves, and vice versa.  But with ES is there a way to set up a handful of nodes that are basically their own cluster, but get their data from a master cluster which does not store the _source?

On Thursday, September 27, 2012 9:06:17 PM UTC-4, David Pilato wrote:
See here: https://github.com/elasticsearch/elasticsearch/issues/1077

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 28 sept. 2012 à 02:49, es_learner <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="OhlWfy1JUWwJ">da...@...> a écrit :

Is it supported?

The objective here is to 'tee' new docs into a secondary index.  My current
implementation is to write twice from the client - once to primary index and
the other to secondary.  Primary index gets pruned every month.  Secondary
is never pruned.



--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/How-to-setup-an-ES-to-ES-river-tp4023286.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--


--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: How to setup an ES-to-ES river?

joergprante@gmail.com
Hi,

can you please elaborate what is the kind of "traffic"? Is it data load for indexing, or search requests hitting the cluster, or both of them?
You can set up data-less ES nodes that can absorb all the network connection load, that is very easy. You can also dedicate data-less nodes to different ports, if you mean that by addressing internal/external traffic.

Jörg

On Thursday, November 8, 2012 8:41:35 PM UTC+1, James Boehmer wrote:
I'm actually looking for something very similar, which I do not believe is the same as that _source river request.  I need to run two Elasticsearch stacks separately but simultaneously, to segregate internal traffic from external traffic.  With Solr I would set up a single master, and run two sets of slaves load balanced independently.  That way the internal slaves could never be affected by traffic hitting the external slaves, and vice versa.  But with ES is there a way to set up a handful of nodes that are basically their own cluster, but get their data from a master cluster which does not store the _source?

On Thursday, September 27, 2012 9:06:17 PM UTC-4, David Pilato wrote:
See here: https://github.com/elasticsearch/elasticsearch/issues/1077

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 28 sept. 2012 à 02:49, es_learner <[hidden email]> a écrit :

Is it supported?

The objective here is to 'tee' new docs into a secondary index.  My current
implementation is to write twice from the client - once to primary index and
the other to secondary.  Primary index gets pruned every month.  Secondary
is never pruned.



--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/How-to-setup-an-ES-to-ES-river-tp4023286.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--


--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: How to setup an ES-to-ES river?

James Boehmer
It would be solely for querying.  For example, we'd like to have a cluster with 5 shards/1 replica being constantly indexed and queried.  Then we'd like to have a second cluster for serving external query traffic, but would get its data from the first cluster.  The second cluster would have its own complete set of primary/replica shards separate from the first cluster.  However, we would like it to index it passively from the first cluster instead of having to manually index both clusters simultaneously.  The purpose of the second cluster is to be able to scale and absorb traffic independently from the internal cluster.  It's somewhat important that they not interfere with each other, but I suppose that an entire single cluster could scale to handle all of the traffic anyway.

On Thursday, November 8, 2012 7:53:05 PM UTC-5, Jörg Prante wrote:
Hi,

can you please elaborate what is the kind of "traffic"? Is it data load for indexing, or search requests hitting the cluster, or both of them?
You can set up data-less ES nodes that can absorb all the network connection load, that is very easy. You can also dedicate data-less nodes to different ports, if you mean that by addressing internal/external traffic.

Jörg

On Thursday, November 8, 2012 8:41:35 PM UTC+1, James Boehmer wrote:
I'm actually looking for something very similar, which I do not believe is the same as that _source river request.  I need to run two Elasticsearch stacks separately but simultaneously, to segregate internal traffic from external traffic.  With Solr I would set up a single master, and run two sets of slaves load balanced independently.  That way the internal slaves could never be affected by traffic hitting the external slaves, and vice versa.  But with ES is there a way to set up a handful of nodes that are basically their own cluster, but get their data from a master cluster which does not store the _source?

On Thursday, September 27, 2012 9:06:17 PM UTC-4, David Pilato wrote:
See here: https://github.com/elasticsearch/elasticsearch/issues/1077

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 28 sept. 2012 à 02:49, es_learner <[hidden email]> a écrit :

Is it supported?

The objective here is to 'tee' new docs into a secondary index.  My current
implementation is to write twice from the client - once to primary index and
the other to secondary.  Primary index gets pruned every month.  Secondary
is never pruned.



--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/How-to-setup-an-ES-to-ES-river-tp4023286.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--


--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: How to setup an ES-to-ES river?

joergprante@gmail.com
Hi James,

you can choose a setup within a single cluster, where the nodes (the cluster members) serve different purposes. No need for a second cluster.

ES nodes can be started in a data-only mode, without HTTP server, so they never process client requests, but only do the heavy lifting. 

Proxy nodes can be started without data, but with HTTP, so they only process client requests and forward them to the data nodes involved in the queries.

You can start as many proxy nodes and data nodes as you want, so you scale the nodes in two aspects.

In my view, if you separate proxy and data nodes into two clusters, there are much hassles. Nodes can not talk to each other over cluster boundaries. You would have to store your data twice by doing it with your client tool alone (while ES can do it for you a lot easier by using replica levels), and afterwards, you would have to keep the data in sync when nodes fail (what is tedious when doing it with external client tools, while ES is doing it for you automatically by replicated shards and allocation control).

Cheers,

Jörg

On Friday, November 9, 2012 3:15:39 AM UTC+1, James Boehmer wrote:
It would be solely for querying.  For example, we'd like to have a cluster with 5 shards/1 replica being constantly indexed and queried.  Then we'd like to have a second cluster for serving external query traffic, but would get its data from the first cluster.  The second cluster would have its own complete set of primary/replica shards separate from the first cluster.  However, we would like it to index it passively from the first cluster instead of having to manually index both clusters simultaneously.  The purpose of the second cluster is to be able to scale and absorb traffic independently from the internal cluster.  It's somewhat important that they not interfere with each other, but I suppose that an entire single cluster could scale to handle all of the traffic anyway.

On Thursday, November 8, 2012 7:53:05 PM UTC-5, Jörg Prante wrote:
Hi,

can you please elaborate what is the kind of "traffic"? Is it data load for indexing, or search requests hitting the cluster, or both of them?
You can set up data-less ES nodes that can absorb all the network connection load, that is very easy. You can also dedicate data-less nodes to different ports, if you mean that by addressing internal/external traffic.

Jörg

On Thursday, November 8, 2012 8:41:35 PM UTC+1, James Boehmer wrote:
I'm actually looking for something very similar, which I do not believe is the same as that _source river request.  I need to run two Elasticsearch stacks separately but simultaneously, to segregate internal traffic from external traffic.  With Solr I would set up a single master, and run two sets of slaves load balanced independently.  That way the internal slaves could never be affected by traffic hitting the external slaves, and vice versa.  But with ES is there a way to set up a handful of nodes that are basically their own cluster, but get their data from a master cluster which does not store the _source?

On Thursday, September 27, 2012 9:06:17 PM UTC-4, David Pilato wrote:
See here: https://github.com/elasticsearch/elasticsearch/issues/1077

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 28 sept. 2012 à 02:49, es_learner <[hidden email]> a écrit :

Is it supported?

The objective here is to 'tee' new docs into a secondary index.  My current
implementation is to write twice from the client - once to primary index and
the other to secondary.  Primary index gets pruned every month.  Secondary
is never pruned.



--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/How-to-setup-an-ES-to-ES-river-tp4023286.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--


--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: How to setup an ES-to-ES river?

James Boehmer
Hi Jörg, 

We would like each cluster to do its own heavy lifting.  As for HTTP, we are using a load balancer for insertions and queries, so essentially every node in the clusters gets to serve both purposes in round robin fashion.  I do not think separating the HTTP requests from the heavy lifting of searching is quite what we're looking for in this situation.  But what do you mean by replica levels?  Would that imply creating additional replicas of a shard, and assigning them to specific nodes?

-Jim

On Friday, November 9, 2012 5:37:14 AM UTC-5, Jörg Prante wrote:
Hi James,

you can choose a setup within a single cluster, where the nodes (the cluster members) serve different purposes. No need for a second cluster.

ES nodes can be started in a data-only mode, without HTTP server, so they never process client requests, but only do the heavy lifting. 

Proxy nodes can be started without data, but with HTTP, so they only process client requests and forward them to the data nodes involved in the queries.

You can start as many proxy nodes and data nodes as you want, so you scale the nodes in two aspects.

In my view, if you separate proxy and data nodes into two clusters, there are much hassles. Nodes can not talk to each other over cluster boundaries. You would have to store your data twice by doing it with your client tool alone (while ES can do it for you a lot easier by using replica levels), and afterwards, you would have to keep the data in sync when nodes fail (what is tedious when doing it with external client tools, while ES is doing it for you automatically by replicated shards and allocation control).

Cheers,

Jörg

On Friday, November 9, 2012 3:15:39 AM UTC+1, James Boehmer wrote:
It would be solely for querying.  For example, we'd like to have a cluster with 5 shards/1 replica being constantly indexed and queried.  Then we'd like to have a second cluster for serving external query traffic, but would get its data from the first cluster.  The second cluster would have its own complete set of primary/replica shards separate from the first cluster.  However, we would like it to index it passively from the first cluster instead of having to manually index both clusters simultaneously.  The purpose of the second cluster is to be able to scale and absorb traffic independently from the internal cluster.  It's somewhat important that they not interfere with each other, but I suppose that an entire single cluster could scale to handle all of the traffic anyway.

On Thursday, November 8, 2012 7:53:05 PM UTC-5, Jörg Prante wrote:
Hi,

can you please elaborate what is the kind of "traffic"? Is it data load for indexing, or search requests hitting the cluster, or both of them?
You can set up data-less ES nodes that can absorb all the network connection load, that is very easy. You can also dedicate data-less nodes to different ports, if you mean that by addressing internal/external traffic.

Jörg

On Thursday, November 8, 2012 8:41:35 PM UTC+1, James Boehmer wrote:
I'm actually looking for something very similar, which I do not believe is the same as that _source river request.  I need to run two Elasticsearch stacks separately but simultaneously, to segregate internal traffic from external traffic.  With Solr I would set up a single master, and run two sets of slaves load balanced independently.  That way the internal slaves could never be affected by traffic hitting the external slaves, and vice versa.  But with ES is there a way to set up a handful of nodes that are basically their own cluster, but get their data from a master cluster which does not store the _source?

On Thursday, September 27, 2012 9:06:17 PM UTC-4, David Pilato wrote:
See here: https://github.com/elasticsearch/elasticsearch/issues/1077

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 28 sept. 2012 à 02:49, es_learner <[hidden email]> a écrit :

Is it supported?

The objective here is to 'tee' new docs into a secondary index.  My current
implementation is to write twice from the client - once to primary index and
the other to secondary.  Primary index gets pruned every month.  Secondary
is never pruned.



--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/How-to-setup-an-ES-to-ES-river-tp4023286.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--


--