Recommended setup & configuration for 3 servers

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Recommended setup & configuration for 3 servers

doug livesey
Hi, I was wondering if anyone could advise me, or point me in the direction of the relevant documentation, on how to best setup elasticsearch to run across 3 servers.
I'd like the 3 instances on the 3 servers to be replications of each other, to automatically failover to each other, and to automatically recover and rebuild in case one of them fell over.
Thanks very much for all advice,
   Doug.
Reply | Threaded
Open this post in threaded view
|

Re: Recommended setup & configuration for 3 servers

vineeth mohan
I guess the best option is to use load balancers , so that even if 1 machine fails , it fail overs to someone else.
Next make the number of replica to 1 , so that even if 1 fails , someone else takes up the job.
Making it to 2 in this contest will also help.

Thanks
          Vineeth

On Mon, Oct 17, 2011 at 4:37 PM, doug livesey <[hidden email]> wrote:
Hi, I was wondering if anyone could advise me, or point me in the direction of the relevant documentation, on how to best setup elasticsearch to run across 3 servers.
I'd like the 3 instances on the 3 servers to be replications of each other, to automatically failover to each other, and to automatically recover and rebuild in case one of them fell over.
Thanks very much for all advice,
   Doug.

Reply | Threaded
Open this post in threaded view
|

Re: Recommended setup & configuration for 3 servers

Clinton Gormley-2
On Mon, 2011-10-17 at 16:42 +0530, Vineeth Mohan wrote:
> I guess the best option is to use load balancers , so that even if 1
> machine fails , it fail overs to someone else.

No need for a load balancer, as long as your client knows about all 3
servers and knows to try the next server in the list if the current
server fails.

The Perl API will do this automatically.  I think the Java API will too.
Your mileage may vary with other clients.

> Next make the number of replica to 1 , so that even if 1 fails ,
> someone else takes up the job.
> Making it to 2 in this contest will also help.

Setting replicas to 2 would mean that all 3 machines have all of your
data, so that 2 machines could die at the same time, and the third would
still have all data.

With 1 replica, there will be 2 copies of all your data.  If one server
dies, and your cluster has enough time (which depends how much data you
have) to redistribute your shards, then you will be fine.

If 2 servers die at the same time, then you will be missing data.

clint

Reply | Threaded
Open this post in threaded view
|

Re: Recommended setup & configuration for 3 servers

vineeth mohan
will elasticSearch take casre of load balancing part.
Like there is a machine X,Y whith the same data. If X feels that Y is a lil more idle that itself , will it re distribute the load to Y ?
If so , how ?
I would appreciate if you can give some documentations.

Thanks
           Vineeth

On Mon, Oct 17, 2011 at 5:25 PM, Clinton Gormley <[hidden email]> wrote:
On Mon, 2011-10-17 at 16:42 +0530, Vineeth Mohan wrote:
> I guess the best option is to use load balancers , so that even if 1
> machine fails , it fail overs to someone else.

No need for a load balancer, as long as your client knows about all 3
servers and knows to try the next server in the list if the current
server fails.

The Perl API will do this automatically.  I think the Java API will too.
Your mileage may vary with other clients.

> Next make the number of replica to 1 , so that even if 1 fails ,
> someone else takes up the job.
> Making it to 2 in this contest will also help.

Setting replicas to 2 would mean that all 3 machines have all of your
data, so that 2 machines could die at the same time, and the third would
still have all data.

With 1 replica, there will be 2 copies of all your data.  If one server
dies, and your cluster has enough time (which depends how much data you
have) to redistribute your shards, then you will be fine.

If 2 servers die at the same time, then you will be missing data.

clint


Reply | Threaded
Open this post in threaded view
|

Re: Recommended setup & configuration for 3 servers

dadoonet
In reply to this post by Clinton Gormley-2
> The Perl API will do this automatically.  I think the Java API will too.
Yes. Java API does it perfectly !
Reply | Threaded
Open this post in threaded view
|

Re: Recommended setup & configuration for 3 servers

doug livesey
Thanks for those responses.
So would I be wrong in assuming that what I want to achieve can be done out-of-the-box with a few configuration options in elasticsearch?
Incidentally, I'm using the HTTP API.

On 17 October 2011 13:15, David Pilato <[hidden email]> wrote:
> The Perl API will do this automatically.  I think the Java API will too.
Yes. Java API does it perfectly !

Reply | Threaded
Open this post in threaded view
|

Re: Recommended setup & configuration for 3 servers

Clinton Gormley-2
On Mon, 2011-10-17 at 13:23 +0100, doug livesey wrote:
> Thanks for those responses.
> So would I be wrong in assuming that what I want to achieve can be
> done out-of-the-box with a few configuration options in elasticsearch?

No, you'd be *correct* in assuming that it works out of the box :)

ES clusters automatically.  So the only change you might need to make is
to change your indices from having 1 replica to 2, but that is up to
you.  If you have replicas 1, the load is distributed, if you have
replicas 2, then all 3 nodes have the same data.

> Incidentally, I'm using the HTTP API.

OK - so your client/application code needs to know about all 3 servers,
and to try the next server in the list if the current server isn't
working.

That's the only bit that you need to handle yourself

clint


Reply | Threaded
Open this post in threaded view
|

Re: Recommended setup & configuration for 3 servers

dadoonet
In reply to this post by doug livesey
I think you should ask ES with the admin REST API information about nodes in the cluster every 5 minutes for example and then use the first node as your main "server node".
If it fails, use the second one.

HTH
David ;-)

Le 17 oct. 2011 à 14:23, doug livesey <[hidden email]> a écrit :

Thanks for those responses.
So would I be wrong in assuming that what I want to achieve can be done out-of-the-box with a few configuration options in elasticsearch?
Incidentally, I'm using the HTTP API.

On 17 October 2011 13:15, David Pilato <[hidden email]> wrote:
> The Perl API will do this automatically.  I think the Java API will too.
Yes. Java API does it perfectly !

Reply | Threaded
Open this post in threaded view
|

Re: Recommended setup & configuration for 3 servers

doug livesey
Ah, so if I wanted automatic failover, etc., I'd need to be using a client (it would be Ruby in my case).

On 17 October 2011 13:38, David Pilato <[hidden email]> wrote:
I think you should ask ES with the admin REST API information about nodes in the cluster every 5 minutes for example and then use the first node as your main "server node".
If it fails, use the second one.

HTH
David ;-)

Le 17 oct. 2011 à 14:23, doug livesey <[hidden email]> a écrit :

Thanks for those responses.
So would I be wrong in assuming that what I want to achieve can be done out-of-the-box with a few configuration options in elasticsearch?
Incidentally, I'm using the HTTP API.

On 17 October 2011 13:15, David Pilato <[hidden email][hidden email]> wrote:
> The Perl API will do this automatically.  I think the Java API will too.
Yes. Java API does it perfectly !


Reply | Threaded
Open this post in threaded view
|

Re: Recommended setup & configuration for 3 servers

vineeth mohan
Wont it be a better idea to use a load balancer instead of ES do it.
In that case , the change (like adding a new ES node or bringing down the master) needs to be made in only 1 place right.

Thanks
          Vineeth


On Mon, Oct 17, 2011 at 6:20 PM, doug livesey <[hidden email]> wrote:
Ah, so if I wanted automatic failover, etc., I'd need to be using a client (it would be Ruby in my case).


On 17 October 2011 13:38, David Pilato <[hidden email]> wrote:
I think you should ask ES with the admin REST API information about nodes in the cluster every 5 minutes for example and then use the first node as your main "server node".
If it fails, use the second one.

HTH
David ;-)

Le 17 oct. 2011 à 14:23, doug livesey <[hidden email]> a écrit :

Thanks for those responses.
So would I be wrong in assuming that what I want to achieve can be done out-of-the-box with a few configuration options in elasticsearch?
Incidentally, I'm using the HTTP API.

On 17 October 2011 13:15, David Pilato <[hidden email][hidden email]> wrote:
> The Perl API will do this automatically.  I think the Java API will too.
Yes. Java API does it perfectly !



Reply | Threaded
Open this post in threaded view
|

Re: Recommended setup & configuration for 3 servers

Jérémie BORDIER
If you use the default ElasticSearch client (the Node one, not the
Transport one), your client will act just as another elasticsearch
node and will be aware of added/removed nodes, of how to best route
your queries etc... It's the best way to go.

Jérémie

On Mon, Oct 17, 2011 at 2:59 PM, Vineeth Mohan
<[hidden email]> wrote:

> Wont it be a better idea to use a load balancer instead of ES do it.
> In that case , the change (like adding a new ES node or bringing down the
> master) needs to be made in only 1 place right.
>
> Thanks
>           Vineeth
>
>
> On Mon, Oct 17, 2011 at 6:20 PM, doug livesey <[hidden email]> wrote:
>>
>> Ah, so if I wanted automatic failover, etc., I'd need to be using a client
>> (it would be Ruby in my case).
>>
>> On 17 October 2011 13:38, David Pilato <[hidden email]> wrote:
>>>
>>> I think you should ask ES with the admin REST API information about nodes
>>> in the cluster every 5 minutes for example and then use the first node as
>>> your main "server node".
>>> If it fails, use the second one.
>>> HTH
>>> David ;-)
>>> Le 17 oct. 2011 à 14:23, doug livesey <[hidden email]> a écrit :
>>>
>>> Thanks for those responses.
>>> So would I be wrong in assuming that what I want to achieve can be done
>>> out-of-the-box with a few configuration options in elasticsearch?
>>> Incidentally, I'm using the HTTP API.
>>>
>>> On 17 October 2011 13:15, David Pilato <[hidden email]> wrote:
>>>>
>>>> > The Perl API will do this automatically.  I think the Java API will
>>>> > too.
>>>> Yes. Java API does it perfectly !
>>>
>>
>
>



--
Jérémie 'ahFeel' BORDIER
Reply | Threaded
Open this post in threaded view
|

Re: Recommended setup & configuration for 3 servers

doug livesey
In reply to this post by vineeth mohan
So if I did this: 
1) Setup elasticsearch on my 3 servers, which are on the same network
2) Gave them the same cluster.name
3) Set node.master and node.data to be true for each of them
4) Told the index I was using to have 3 replicas

That wouldn't achieve what I wanted?

On 17 October 2011 13:59, Vineeth Mohan <[hidden email]> wrote:
Wont it be a better idea to use a load balancer instead of ES do it.
In that case , the change (like adding a new ES node or bringing down the master) needs to be made in only 1 place right.

Thanks
          Vineeth



On Mon, Oct 17, 2011 at 6:20 PM, doug livesey <[hidden email]> wrote:
Ah, so if I wanted automatic failover, etc., I'd need to be using a client (it would be Ruby in my case).


On 17 October 2011 13:38, David Pilato <[hidden email]> wrote:
I think you should ask ES with the admin REST API information about nodes in the cluster every 5 minutes for example and then use the first node as your main "server node".
If it fails, use the second one.

HTH
David ;-)

Le 17 oct. 2011 à 14:23, doug livesey <[hidden email]> a écrit :

Thanks for those responses.
So would I be wrong in assuming that what I want to achieve can be done out-of-the-box with a few configuration options in elasticsearch?
Incidentally, I'm using the HTTP API.

On 17 October 2011 13:15, David Pilato <[hidden email][hidden email]> wrote:
> The Perl API will do this automatically.  I think the Java API will too.
Yes. Java API does it perfectly !




Reply | Threaded
Open this post in threaded view
|

Re: Recommended setup & configuration for 3 servers

dadoonet
In reply to this post by Jérémie BORDIER
100% agree !

David ;-)

Le 17 oct. 2011 à 15:05, Jérémie BORDIER <[hidden email]> a écrit :

> If you use the default ElasticSearch client (the Node one, not the
> Transport one), your client will act just as another elasticsearch
> node and will be aware of added/removed nodes, of how to best route
> your queries etc... It's the best way to go.
>
> Jérémie
>
> On Mon, Oct 17, 2011 at 2:59 PM, Vineeth Mohan
> <[hidden email]> wrote:
>> Wont it be a better idea to use a load balancer instead of ES do it.
>> In that case , the change (like adding a new ES node or bringing down the
>> master) needs to be made in only 1 place right.
>>
>> Thanks
>>           Vineeth
>>
>>
>> On Mon, Oct 17, 2011 at 6:20 PM, doug livesey <[hidden email]> wrote:
>>>
>>> Ah, so if I wanted automatic failover, etc., I'd need to be using a client
>>> (it would be Ruby in my case).
>>>
>>> On 17 October 2011 13:38, David Pilato <[hidden email]> wrote:
>>>>
>>>> I think you should ask ES with the admin REST API information about nodes
>>>> in the cluster every 5 minutes for example and then use the first node as
>>>> your main "server node".
>>>> If it fails, use the second one.
>>>> HTH
>>>> David ;-)
>>>> Le 17 oct. 2011 à 14:23, doug livesey <[hidden email]> a écrit :
>>>>
>>>> Thanks for those responses.
>>>> So would I be wrong in assuming that what I want to achieve can be done
>>>> out-of-the-box with a few configuration options in elasticsearch?
>>>> Incidentally, I'm using the HTTP API.
>>>>
>>>> On 17 October 2011 13:15, David Pilato <[hidden email]> wrote:
>>>>>
>>>>>> The Perl API will do this automatically.  I think the Java API will
>>>>>> too.
>>>>> Yes. Java API does it perfectly !
>>>>
>>>
>>
>>
>
>
>
> --
> Jérémie 'ahFeel' BORDIER
Reply | Threaded
Open this post in threaded view
|

Re: Recommended setup & configuration for 3 servers

doug livesey
In reply to this post by Jérémie BORDIER
That definitely sounds the best way to go, but I'm struggling understanding some of what people are suggesting, sorry.
I'm looking through the docs (and have been for some time), but not really seeing how to do any of this.
Could people suggest some of the config settings I need to research to better understand some of the suggestions?

On 17 October 2011 14:05, Jérémie BORDIER <[hidden email]> wrote:
If you use the default ElasticSearch client (the Node one, not the
Transport one), your client will act just as another elasticsearch
node and will be aware of added/removed nodes, of how to best route
your queries etc... It's the best way to go.

Jérémie

On Mon, Oct 17, 2011 at 2:59 PM, Vineeth Mohan
<[hidden email]> wrote:
> Wont it be a better idea to use a load balancer instead of ES do it.
> In that case , the change (like adding a new ES node or bringing down the
> master) needs to be made in only 1 place right.
>
> Thanks
>           Vineeth
>
>
> On Mon, Oct 17, 2011 at 6:20 PM, doug livesey <[hidden email]> wrote:
>>
>> Ah, so if I wanted automatic failover, etc., I'd need to be using a client
>> (it would be Ruby in my case).
>>
>> On 17 October 2011 13:38, David Pilato <[hidden email]> wrote:
>>>
>>> I think you should ask ES with the admin REST API information about nodes
>>> in the cluster every 5 minutes for example and then use the first node as
>>> your main "server node".
>>> If it fails, use the second one.
>>> HTH
>>> David ;-)
>>> Le 17 oct. 2011 à 14:23, doug livesey <[hidden email]> a écrit :
>>>
>>> Thanks for those responses.
>>> So would I be wrong in assuming that what I want to achieve can be done
>>> out-of-the-box with a few configuration options in elasticsearch?
>>> Incidentally, I'm using the HTTP API.
>>>
>>> On 17 October 2011 13:15, David Pilato <[hidden email]> wrote:
>>>>
>>>> > The Perl API will do this automatically.  I think the Java API will
>>>> > too.
>>>> Yes. Java API does it perfectly !
>>>
>>
>
>



--
Jérémie 'ahFeel' BORDIER

Reply | Threaded
Open this post in threaded view
|

Re: Recommended setup & configuration for 3 servers

doug livesey
In reply to this post by dadoonet
PS -- Sorry if I'm being dense. :)

On 17 October 2011 14:07, David Pilato <[hidden email]> wrote:
100% agree !

David ;-)

Le 17 oct. 2011 à 15:05, Jérémie BORDIER <[hidden email]> a écrit :

> If you use the default ElasticSearch client (the Node one, not the
> Transport one), your client will act just as another elasticsearch
> node and will be aware of added/removed nodes, of how to best route
> your queries etc... It's the best way to go.
>
> Jérémie
>
> On Mon, Oct 17, 2011 at 2:59 PM, Vineeth Mohan
> <[hidden email]> wrote:
>> Wont it be a better idea to use a load balancer instead of ES do it.
>> In that case , the change (like adding a new ES node or bringing down the
>> master) needs to be made in only 1 place right.
>>
>> Thanks
>>           Vineeth
>>
>>
>> On Mon, Oct 17, 2011 at 6:20 PM, doug livesey <[hidden email]> wrote:
>>>
>>> Ah, so if I wanted automatic failover, etc., I'd need to be using a client
>>> (it would be Ruby in my case).
>>>
>>> On 17 October 2011 13:38, David Pilato <[hidden email]> wrote:
>>>>
>>>> I think you should ask ES with the admin REST API information about nodes
>>>> in the cluster every 5 minutes for example and then use the first node as
>>>> your main "server node".
>>>> If it fails, use the second one.
>>>> HTH
>>>> David ;-)
>>>> Le 17 oct. 2011 à 14:23, doug livesey <[hidden email]> a écrit :
>>>>
>>>> Thanks for those responses.
>>>> So would I be wrong in assuming that what I want to achieve can be done
>>>> out-of-the-box with a few configuration options in elasticsearch?
>>>> Incidentally, I'm using the HTTP API.
>>>>
>>>> On 17 October 2011 13:15, David Pilato <[hidden email]> wrote:
>>>>>
>>>>>> The Perl API will do this automatically.  I think the Java API will
>>>>>> too.
>>>>> Yes. Java API does it perfectly !
>>>>
>>>
>>
>>
>
>
>
> --
> Jérémie 'ahFeel' BORDIER

Reply | Threaded
Open this post in threaded view
|

Re: Recommended setup & configuration for 3 servers

Clinton Gormley-2
In reply to this post by vineeth mohan
On Mon, 2011-10-17 at 18:29 +0530, Vineeth Mohan wrote:
> Wont it be a better idea to use a load balancer instead of ES do it.
> In that case , the change (like adding a new ES node or bringing down
> the master) needs to be made in only 1 place right.

A load balancer is one option, but frankly, if your client already
handles this issue, then you are adding a redundant layer.

The Perl client API, for example, accepts a list of potential nodes.

When connecting to the cluster for the first time, it tries each node in
the list in turn, until it gets a successful response.

Then it uses the cluster API to retrieve a list of all live nodes that
the cluster knows about.

It round-robins through the list of live nodes (to spread the load
between servers) and if any node fails, it tries to refresh the list of
live servers again.  (It also refreshes the live list every $x
requests).

https://metacpan.org/source/DRTECH/ElasticSearch-0.46/lib/ElasticSearch/Transport.pm#L191
>
You can also configure the Perl client to not retrieve the live list,
but just to round-robin and failover using the provided list of nodes.

clint



Reply | Threaded
Open this post in threaded view
|

Re: Recommended setup & configuration for 3 servers

Clinton Gormley-2
In reply to this post by doug livesey
On Mon, 2011-10-17 at 14:05 +0100, doug livesey wrote:
> So if I did this:
> 1) Setup elasticsearch on my 3 servers, which are on the same network
> 2) Gave them the same cluster.name
> 3) Set node.master and node.data to be true for each of them
> 4) Told the index I was using to have 3 replicas
>
> That wouldn't achieve what I wanted?

That is exactly what you need to do on the server side, except 2
replicas, not 3. You have primary + 2 replicas = 3 in total.

So that's all you need on the ES side.

The bit that is missing is on the client side.  It needs to know about
all the nodes you have, otherwise if it is only talking to one node and
that node goes down, then it can't failover.

The alternative of doing it in the client would be, as Vineeth suggests,
to use a load balancer which does know about all nodes.

But then what happens if your load balancer goes down ;)

clint



Reply | Threaded
Open this post in threaded view
|

Re: Recommended setup & configuration for 3 servers

doug livesey
Right, and the HTTP API doesn't handle failover, so I (or a more featured client) would have to handle that. Okay, thanks.
How do the nodes on my 3 servers know about each other?
So that when I index to one, the others know about it, too, to replicate it.
Or don't they?
Again, sorry if I'm being dense, I do seem to have made a number of false assumptions about the features available from the HTTP API.
Cheers,
   Doug.

On 17 October 2011 14:15, Clinton Gormley <[hidden email]> wrote:
On Mon, 2011-10-17 at 14:05 +0100, doug livesey wrote:
> So if I did this:
> 1) Setup elasticsearch on my 3 servers, which are on the same network
> 2) Gave them the same cluster.name
> 3) Set node.master and node.data to be true for each of them
> 4) Told the index I was using to have 3 replicas
>
> That wouldn't achieve what I wanted?

That is exactly what you need to do on the server side, except 2
replicas, not 3. You have primary + 2 replicas = 3 in total.

So that's all you need on the ES side.

The bit that is missing is on the client side.  It needs to know about
all the nodes you have, otherwise if it is only talking to one node and
that node goes down, then it can't failover.

The alternative of doing it in the client would be, as Vineeth suggests,
to use a load balancer which does know about all nodes.

But then what happens if your load balancer goes down ;)

clint




Reply | Threaded
Open this post in threaded view
|

Re: Recommended setup & configuration for 3 servers

Clinton Gormley-2
On Mon, 2011-10-17 at 14:24 +0100, doug livesey wrote:
> Right, and the HTTP API doesn't handle failover, so I (or a more
> featured client) would have to handle that. Okay, thanks.
> How do the nodes on my 3 servers know about each other?
> So that when I index to one, the others know about it, too, to
> replicate it.
> Or don't they?

They do, and without any further configuration.

All you have to do is to make sure that:
1) each node has the same cluster name and
2) the nodes can see each other via port 9300
3) multicast is enabled on your network (or you can use configure your
   nodes to use unicast to discover each other)

> Again, sorry if I'm being dense, I do seem to have made a number of
> false assumptions about the features available from the HTTP API.

Note: this is not a failure of the HTTP API in ES, but it is the client
you are using which is missing this feature.

As long as your client can speak to the HTTP API of any live node, you
are fine.  The problem is if you only speak to one node, and that node
dies, then your client doesn't know how to speak to the other nodes.

clint




Reply | Threaded
Open this post in threaded view
|

Re: Recommended setup & configuration for 3 servers

doug livesey
Okay, I'm getting a bit clearer, thankyou! :)
So if I installed elasticsearch on my 3 servers (all on the same network, with multicast enabled (is that an apt-get install?)), used the same clustername for them, and they could all see each other on post 9300, would ...
1) An index created on one with 2 replicas automatically replicate across the 3 servers?
2) A document indexed to one server automatically be replicated to 2 other replicas on the other two servers?
3) A fallen-over node be able to bring itself back up, repair itself, and add itself back into the cluster? Would the service wrapper do this?

& thanks again for taking the time to answer my questions.

On 17 October 2011 14:45, Clinton Gormley <[hidden email]> wrote:
On Mon, 2011-10-17 at 14:24 +0100, doug livesey wrote:
> Right, and the HTTP API doesn't handle failover, so I (or a more
> featured client) would have to handle that. Okay, thanks.
> How do the nodes on my 3 servers know about each other?
> So that when I index to one, the others know about it, too, to
> replicate it.
> Or don't they?

They do, and without any further configuration.

All you have to do is to make sure that:
1) each node has the same cluster name and
2) the nodes can see each other via port 9300
3) multicast is enabled on your network (or you can use configure your
  nodes to use unicast to discover each other)

> Again, sorry if I'm being dense, I do seem to have made a number of
> false assumptions about the features available from the HTTP API.

Note: this is not a failure of the HTTP API in ES, but it is the client
you are using which is missing this feature.

As long as your client can speak to the HTTP API of any live node, you
are fine.  The problem is if you only speak to one node, and that node
dies, then your client doesn't know how to speak to the other nodes.

clint





12