Hi,
I'm having some troubles configuring ES in the cloud. Most of the time everything works, but sometimes the discovery fails and I endup with two masters using the same cluster name. The situation happens on roughly 1 out of 10 startups. I'm using 0.17.4 embedded, the configuration looks like this ------------------------------------------------------------ cluster: name: default-cluster-name index: number_of_shards: 2 number_of_replicas: 1 discovery: type: ec2 zen: minimum_master_nodes: 1 cloud: aws: access_key: XXXXXXXXXX secret_key: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Trace logs can be found here https://gist.github.com/1134288. Any ideas what am I missing? Thanks in advance, Pavel |
Administrator
|
It seems like the two nodes ended up not seeing each other properly, thus each elected itself as the master. If you increase the ping_timeout (it defaults to 3s) then it should go away. Set discovery.zen.ping.timeout to something like 10s or 20s.
On Tue, Aug 9, 2011 at 6:19 PM, Pavel Penchev <[hidden email]> wrote:
|
Shay,
If two nodes did participate in a network partition (even on a local network), and thus end up self-promoting each other to a master status, what happens when they see each other again? Jason On Aug 9, 1:22 pm, Shay Banon <[hidden email]> wrote: > It seems like the two nodes ended up not seeing each other properly, thus > each elected itself as the master. If you increase the ping_timeout (it > defaults to 3s) then it should go away. Set discovery.zen.ping.timeout to > something like 10s or 20s. > > On Tue, Aug 9, 2011 at 6:19 PM, Pavel Penchev <[hidden email]>wrote: > > > > > > > > > Hi, > > > I'm having some troubles configuring ES in the cloud. Most of the time > > everything works, but sometimes the discovery fails and I endup with two > > masters using the same cluster name. > > The situation happens on roughly 1 out of 10 startups. > > > I'm using 0.17.4 embedded, the configuration looks like this > > ------------------------------------------------------------ > > cluster: > > name: default-cluster-name > > > index: > > number_of_shards: 2 > > number_of_replicas: 1 > > > discovery: > > type: ec2 > > zen: > > minimum_master_nodes: 1 > > > cloud: > > aws: > > access_key: XXXXXXXXXX > > secret_key: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > > > Trace logs can be found herehttps://gist.github.com/1134288. Any ideas > > what am I missing? > > > Thanks in advance, > > Pavel |
Administrator
|
Nothing, they will remain partitioned, and you will need to decide which one to restart. The minimum_master_nodes is there to help reduce chances of it happening. (on a 2 node cluster though, this setting does not mean much).
On Tue, Aug 9, 2011 at 11:06 PM, jjasinek <[hidden email]> wrote: Shay, |
In reply to this post by kimchy
Hi,
sorry to bump an old thread, for completeness I just want to confirm that setting discovery.zen.ping_timeout to 15s works like a charm in my case. Many thanks for the quick response, Pavel On 9.08.2011 21:22, Shay Banon wrote:
|
Hi Pavel, that timeout value will often increase based on the number of non-cluster nodes you have under EC2 management. At least that has been my experience.
A trick to keep it working well is to make sure that all the nodes that are in your ElasticSearch cluster are part of the same EC2 group. Then use the ES groups setting to limit those nodes that ES looks for to establish membership.
|
Administrator
|
Two notes on that: You can use ec2 tags as well to filter down the list of instances needed to be pinged, and, in 0.17, the unicast discovery is considerably more lightweight compared to previous versions.
On Tue, Aug 16, 2011 at 10:17 PM, James Cook <[hidden email]> wrote: Hi Pavel, that timeout value will often increase based on the number of non-cluster nodes you have under EC2 management. At least that has been my experience. |
In reply to this post by James Cook
Thanks James, we'll make use of the setting. Indeed the production
EC2 environment is quite heterogeneous.
Pavel On 16.08.2011 22:17, James Cook wrote: Hi Pavel, that timeout value will often increase based on the number of non-cluster nodes you have under EC2 management. At least that has been my experience. |
In reply to this post by James Cook
Hi James (or anybody else with similar experience)
On Tue, 2011-08-16 at 12:17 -0700, James Cook wrote: > Hi Pavel, that timeout value will often increase based on the number > of non-cluster nodes you have under EC2 management. At least that has > been my experience. > > A trick to keep it working well is to make sure that all the nodes > that are in your ElasticSearch cluster are part of the same EC2 group. > Then use the ES groups setting to limit those nodes that ES looks for > to establish membership. Given that getting ES to work well under EC2 seems to present a bit of a challenge, how would you feel about writing a tutorial for elasticsearch.org? It would be an invaluable resource. clint |
I think that would be useful as well. I'll try to carve out some time to get something started.
|
In reply to this post by Clinton Gormley-2
And Clinton, a cookbook of search recipes would be awesome to see on a web page. :)
You have solved many gotchas for people over the past months.
|
On Fri, 2011-08-19 at 06:01 -0700, James Cook wrote:
> And Clinton, a cookbook of search recipes would be awesome to see on a > web page. :) > > touché ;) |
Free forum by Nabble | Edit this page |