Can I force to index on one specific node of a cluster

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Can I force to index on one specific node of a cluster

Chunlei  Wu
Hi,

     I plan to setup ES cluster on EC2 like this:

           1:    t1.small (node.data: false, http.enabled:true)      - the front node

           2-4: t1.small (node.data: true, http.enabled:false)      - the worker to serve the queries

           5:    t1.medium(node.data.true, http:enabled:true)    - a more powerful instance to handling index updating


     My indices do not need to be updated in realtime. I need to update indices regularly (say once a week, with a batch of changes). Ideally, I hope I can force the re-indexing happens only on node #5 (without syncing with other nodes), then I do some validation tests on the updated indices. If everything is OK, I can have the updated indices replicated to 2-4 worker nodes. One thing to note is that while I am doing re-indexing on node #5, node 1-4 should be always live to serve queries against old indices.

     Another benefit of that is I only need to start node #5 when I need to update indices. I can just shut it down when it's not used to save the cost. For serving queries, my other small instance nodes are sufficient enough.

     Any thoughts? Or maybe a different cluster setting?

Thanks,

Chunlei


--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Can I force to index on one specific node of a cluster

ppearcy
I prefer to have each node equivalent and spread the search/index load equally, granted I'm running on physical h/w. There could be benefit in having an offline rebuild node, though. 

To answer your question, yes, you can use the shard allocation APIs:
http://www.elasticsearch.org/guide/reference/index-modules/allocation.html

You'll also likely want to look at aliasing in order to swap in a new index:
http://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html

Best Regards,
Paul

On Saturday, January 26, 2013 10:46:17 AM UTC-7, Chunlei Wu wrote:
Hi,

     I plan to setup ES cluster on EC2 like this:

           1:    t1.small (node.data: false, http.enabled:true)      - the front node

           2-4: t1.small (node.data: true, http.enabled:false)      - the worker to serve the queries

           5:    t1.medium(node.data.true, http:enabled:true)    - a more powerful instance to handling index updating


     My indices do not need to be updated in realtime. I need to update indices regularly (say once a week, with a batch of changes). Ideally, I hope I can force the re-indexing happens only on node #5 (without syncing with other nodes), then I do some validation tests on the updated indices. If everything is OK, I can have the updated indices replicated to 2-4 worker nodes. One thing to note is that while I am doing re-indexing on node #5, node 1-4 should be always live to serve queries against old indices.

     Another benefit of that is I only need to start node #5 when I need to update indices. I can just shut it down when it's not used to save the cost. For serving queries, my other small instance nodes are sufficient enough.

     Any thoughts? Or maybe a different cluster setting?

Thanks,

Chunlei


--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Can I force to index on one specific node of a cluster

Chunlei  Wu
Thanks a lot. That should work for me. With the hints you gave, I also find this cluster-wide setting could be useful for my case:

curl -XPUT localhost:9200/_cluster/settings -d '{
    "transient" : {
        "cluster.routing.allocation.exclude._ip" : "10.0.0.1"
    }
}'

That way, I can temporarily exclude a node (probably without changing the index name), and add it back after done with the indexing.

Chunlei


On Sunday, January 27, 2013 8:34:23 PM UTC-8, ppearcy wrote:
I prefer to have each node equivalent and spread the search/index load equally, granted I'm running on physical h/w. There could be benefit in having an offline rebuild node, though. 

To answer your question, yes, you can use the shard allocation APIs:

You'll also likely want to look at aliasing in order to swap in a new index:

Best Regards,
Paul

On Saturday, January 26, 2013 10:46:17 AM UTC-7, Chunlei Wu wrote:
Hi,

     I plan to setup ES cluster on EC2 like this:

           1:    t1.small (node.data: false, http.enabled:true)      - the front node

           2-4: t1.small (node.data: true, http.enabled:false)      - the worker to serve the queries

           5:    t1.medium(node.data.true, http:enabled:true)    - a more powerful instance to handling index updating


     My indices do not need to be updated in realtime. I need to update indices regularly (say once a week, with a batch of changes). Ideally, I hope I can force the re-indexing happens only on node #5 (without syncing with other nodes), then I do some validation tests on the updated indices. If everything is OK, I can have the updated indices replicated to 2-4 worker nodes. One thing to note is that while I am doing re-indexing on node #5, node 1-4 should be always live to serve queries against old indices.

     Another benefit of that is I only need to start node #5 when I need to update indices. I can just shut it down when it's not used to save the cost. For serving queries, my other small instance nodes are sufficient enough.

     Any thoughts? Or maybe a different cluster setting?

Thanks,

Chunlei


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group, send email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.