ec2 discovery stopped working!

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

ec2 discovery stopped working!

Eric Jain
Since yesterday, new client-only nodes I bring up can no longer discover existing nodes on different machines ("waited for 30s and no initial state was set by the discovery").

What's odd is that I haven't changed anything in my setup (elasticsearch 0.90.2, elasticsearch-cloud-aws 1.12.0, S3 gateway), and the machines are in the same security group and can see each other (i.e. I can connect to port 9200).

I've seen this problem previously, but only sporadically.

If I bring up a 2nd data node, rather than just a client node, the 1st data node appears to get stuck in a "auto expanded replicas" loop and eventually runs out of memory...

2013-08-01 10:47:41,620 [INFO] cluster.metadata - [Blindside] updating number_of_replicas to [1] for indices [xyz]
2013-08-01 10:47:41,640 [INFO] cluster.metadata - [Blindside] [xyz] auto expanded replicas to [1]
...

I tried setting "discovery.ec2.ping_timeout" to "15s", but that doesn't make any difference. Other ideas?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: ec2 discovery stopped working!

Eric Jain
Should add that I can telnet from the client machine to port 9300 on the machine running elasticsearch.

What else could I check?


On Thursday, August 1, 2013 2:23:12 PM UTC-7, Eric Jain wrote:
Since yesterday, new client-only nodes I bring up can no longer discover existing nodes on different machines ("waited for 30s and no initial state was set by the discovery").

What's odd is that I haven't changed anything in my setup (elasticsearch 0.90.2, elasticsearch-cloud-aws 1.12.0, S3 gateway), and the machines are in the same security group and can see each other (i.e. I can connect to port 9200).

I've seen this problem previously, but only sporadically.

If I bring up a 2nd data node, rather than just a client node, the 1st data node appears to get stuck in a "auto expanded replicas" loop and eventually runs out of memory...

2013-08-01 10:47:41,620 [INFO] cluster.metadata - [Blindside] updating number_of_replicas to [1] for indices [xyz]
2013-08-01 10:47:41,640 [INFO] cluster.metadata - [Blindside] [xyz] auto expanded replicas to [1]
...

I tried setting "discovery.ec2.ping_timeout" to "15s", but that doesn't make any difference. Other ideas?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: ec2 discovery stopped working!

Eric Jain
Reposted here:

  http://stackoverflow.com/questions/18157096/unreliable-discovery-for-elasticsearch-nodes-on-ec2


On Monday, August 5, 2013 4:43:06 PM UTC-7, Eric Jain wrote:
Should add that I can telnet from the client machine to port 9300 on the machine running elasticsearch.

What else could I check?


On Thursday, August 1, 2013 2:23:12 PM UTC-7, Eric Jain wrote:
Since yesterday, new client-only nodes I bring up can no longer discover existing nodes on different machines ("waited for 30s and no initial state was set by the discovery").

What's odd is that I haven't changed anything in my setup (elasticsearch 0.90.2, elasticsearch-cloud-aws 1.12.0, S3 gateway), and the machines are in the same security group and can see each other (i.e. I can connect to port 9200).

I've seen this problem previously, but only sporadically.

If I bring up a 2nd data node, rather than just a client node, the 1st data node appears to get stuck in a "auto expanded replicas" loop and eventually runs out of memory...

2013-08-01 10:47:41,620 [INFO] cluster.metadata - [Blindside] updating number_of_replicas to [1] for indices [xyz]
2013-08-01 10:47:41,640 [INFO] cluster.metadata - [Blindside] [xyz] auto expanded replicas to [1]
...

I tried setting "discovery.ec2.ping_timeout" to "15s", but that doesn't make any difference. Other ideas?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: ec2 discovery stopped working!

baniu.yao
In reply to this post by Eric Jain
Well, if your node in not in the same C address IP (for example, master: 10.1.1.1, node: 10.1.2.1), you need to specify the params like this: 'discovery.zen.ping.unicast.hosts: ["10.19.1.1"]
'

在 2013年8月2日星期五UTC+8上午5时23分12秒,Eric Jain写道:
Since yesterday, new client-only nodes I bring up can no longer discover existing nodes on different machines ("waited for 30s and no initial state was set by the discovery").

What's odd is that I haven't changed anything in my setup (elasticsearch 0.90.2, elasticsearch-cloud-aws 1.12.0, S3 gateway), and the machines are in the same security group and can see each other (i.e. I can connect to port 9200).

I've seen this problem previously, but only sporadically.

If I bring up a 2nd data node, rather than just a client node, the 1st data node appears to get stuck in a "auto expanded replicas" loop and eventually runs out of memory...

2013-08-01 10:47:41,620 [INFO] cluster.metadata - [Blindside] updating number_of_replicas to [1] for indices [xyz]
2013-08-01 10:47:41,640 [INFO] cluster.metadata - [Blindside] [xyz] auto expanded replicas to [1]
...

I tried setting "discovery.ec2.ping_timeout" to "15s", but that doesn't make any difference. Other ideas?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: ec2 discovery stopped working!

Eric Jain
On Mon, Aug 12, 2013 at 1:27 AM, 姚仁捷 <[hidden email]> wrote:
> Well, if your node in not in the same C address IP (for example, master:
> 10.1.1.1, node: 10.1.2.1), you need to specify the params like this:
> 'discovery.zen.ping.unicast.hosts: ["10.19.1.1"]
> '

Right now I have a healthy cluster with nodes in different A networks
(e.g. 23.20.43.x and 54.221.47.x), so I don't think the
'discovery.zen.ping.unicast.hosts' parameter is required when using
the cloud-aws plugin. Other ideas?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


Reply | Threaded
Open this post in threaded view
|

Re: ec2 discovery stopped working!

Eric Jain
In reply to this post by Eric Jain
Just ran into this again (elasticsearch 0.90.3, elasticsearch-cloud-aws 1.14.0): New nodes ignore existing nodes. The only solution appears to be to shut down the entire cluster first :-(

Looking at the (TRACE-level) logs, the old nodes do seem to be discovered at first, but then the new node elects itself as master, http://localhost:9200/_cluster/health reports a single data node, and shards are restored from S3!

Any ideas (other than don't use the ec2 discovery mechanism)?


On Thursday, August 1, 2013 2:23:12 PM UTC-7, Eric Jain wrote:
Since yesterday, new client-only nodes I bring up can no longer discover existing nodes on different machines ("waited for 30s and no initial state was set by the discovery").

What's odd is that I haven't changed anything in my setup (elasticsearch 0.90.2, elasticsearch-cloud-aws 1.12.0, S3 gateway), and the machines are in the same security group and can see each other (i.e. I can connect to port 9200).

I've seen this problem previously, but only sporadically.

If I bring up a 2nd data node, rather than just a client node, the 1st data node appears to get stuck in a "auto expanded replicas" loop and eventually runs out of memory...

2013-08-01 10:47:41,620 [INFO] cluster.metadata - [Blindside] updating number_of_replicas to [1] for indices [xyz]
2013-08-01 10:47:41,640 [INFO] cluster.metadata - [Blindside] [xyz] auto expanded replicas to [1]
...

I tried setting "discovery.ec2.ping_timeout" to "15s", but that doesn't make any difference. Other ideas?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: ec2 discovery stopped working!

Eric Jain
On Thursday, October 3, 2013 1:00:37 AM UTC-7, Eric Jain wrote:
Just ran into this again (elasticsearch 0.90.3, elasticsearch-cloud-aws 1.14.0): New nodes ignore existing nodes. The only solution appears to be to shut down the entire cluster first :-(

For the record, this problem resurfaced again after a few weeks (now using elasticsearch 0.19.7 and elasticsearch-cloud-aws 1.16.0)--I still don't know what the cause is.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dd290e8e-b1f6-4921-9668-7aeadc5af074%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: ec2 discovery stopped working!

Andrej
Reading the docs it says that aws plugin 1.16.0 is for elasticsearch 0.90.4 and higher. I would not expect it to run with 0.19.7.

Greets
Andrej

Am Mittwoch, 4. Dezember 2013 03:56:38 UTC+1 schrieb Eric Jain:
On Thursday, October 3, 2013 1:00:37 AM UTC-7, Eric Jain wrote:
Just ran into this again (elasticsearch 0.90.3, elasticsearch-cloud-aws 1.14.0): New nodes ignore existing nodes. The only solution appears to be to shut down the entire cluster first :-(

For the record, this problem resurfaced again after a few weeks (now using elasticsearch 0.19.7 and elasticsearch-cloud-aws 1.16.0)--I still don't know what the cause is.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c542cd53-6f1d-46ab-9ff3-0d5ce15c6c7e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: ec2 discovery stopped working!

dadoonet
I think Eric is using 0.90.7 and not 0.19.7…

:-)

-- 
David Pilato | Technical Advocate | Elasticsearch.com


Le 4 décembre 2013 at 12:49:11, Andrej Rosenheinrich ([hidden email]) a écrit:

Reading the docs it says that aws plugin 1.16.0 is for elasticsearch 0.90.4 and higher. I would not expect it to run with 0.19.7.

Greets
Andrej

Am Mittwoch, 4. Dezember 2013 03:56:38 UTC+1 schrieb Eric Jain:
On Thursday, October 3, 2013 1:00:37 AM UTC-7, Eric Jain wrote:
Just ran into this again (elasticsearch 0.90.3, elasticsearch-cloud-aws 1.14.0): New nodes ignore existing nodes. The only solution appears to be to shut down the entire cluster first :-(

For the record, this problem resurfaced again after a few weeks (now using elasticsearch 0.19.7 and elasticsearch-cloud-aws 1.16.0)--I still don't know what the cause is.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c542cd53-6f1d-46ab-9ff3-0d5ce15c6c7e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.529f3821.4f4ef005.bd3d%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: ec2 discovery stopped working!

Eric Jain
On Wed, Dec 4, 2013 at 6:11 AM, David Pilato <[hidden email]> wrote:
> I think Eric is using 0.90.7 and not 0.19.7…

Yes, sorry for the confusion; I wish the problem was that simple :-)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHte5%2B%2BapdgscPomGTnwSAAXTb_w-eJeeebXTxMEoaQkcu90%2Bw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: ec2 discovery stopped working!

Eric Jain
In reply to this post by Eric Jain
On Tuesday, December 3, 2013 6:56:38 PM UTC-8, Eric Jain wrote:
For the record, this problem resurfaced again after a few weeks (now using elasticsearch 0.19.7 and elasticsearch-cloud-aws 1.16.0)--I still don't know what the cause is.

Here is the log file (leading up to the the "org.elasticsearch.discovery.MasterNotDiscoveredException: waited for [30s]" exception) :

  https://gist.github.com/ejain/4b2a57f4ff4cbaea0dec

There are two other machines running, both with a client-only and a data node, and there's nothing obviously wrong with the cluster:

curl -XGET 'http://10.10.209.204:9200/_cluster/health?pretty=true'
{
  "cluster_name" : "prod-39",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 4,
  "number_of_data_nodes" : 2,
  "active_primary_shards" : 226,
  "active_shards" : 452,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0
}

But as mentioned previously, the only way to recover from this situation is to shut down all nodes (or copy the data and start a new cluster).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7cd0c53a-0fa8-456a-9b3e-f4609d3eb3db%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: ec2 discovery stopped working!

Mark Walkom
It might be worth upgrading to 0.90.X, from what I have seen there was some major improvements in discovery.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [hidden email]
web: www.campaignmonitor.com


On 5 December 2013 15:07, Eric Jain <[hidden email]> wrote:
On Tuesday, December 3, 2013 6:56:38 PM UTC-8, Eric Jain wrote:
For the record, this problem resurfaced again after a few weeks (now using elasticsearch 0.19.7 and elasticsearch-cloud-aws 1.16.0)--I still don't know what the cause is.

Here is the log file (leading up to the the "org.elasticsearch.discovery.MasterNotDiscoveredException: waited for [30s]" exception) :

  https://gist.github.com/ejain/4b2a57f4ff4cbaea0dec

There are two other machines running, both with a client-only and a data node, and there's nothing obviously wrong with the cluster:

curl -XGET 'http://10.10.209.204:9200/_cluster/health?pretty=true'
{
  "cluster_name" : "prod-39",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 4,
  "number_of_data_nodes" : 2,
  "active_primary_shards" : 226,
  "active_shards" : 452,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0
}

But as mentioned previously, the only way to recover from this situation is to shut down all nodes (or copy the data and start a new cluster).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7cd0c53a-0fa8-456a-9b3e-f4609d3eb3db%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624bG_Y%3DBVzAh7quxi3Jk5SOYVBhFJk4Go103x22nR1a8kw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: ec2 discovery stopped working!

Eric Jain
On Wed, Dec 4, 2013 at 8:09 PM, Mark Walkom <[hidden email]> wrote:
> It might be worth upgrading to 0.90.X, from what I have seen there was some
> major improvements in discovery.

As mentioned above, I am in fact using the latest (production) version
of both elasticsearch (0.90.7) and the elasticsearch-cloud-aws plugin
(1.16.0).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHte5%2BJ-vOhu%3DDpyt3F9d_tnQGiLisKjmL-NsXcRpDwDnV5Mhg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: ec2 discovery stopped working!

Mark Walkom
Ah, the quote with "now using elasticsearch 0.19.7" threw me.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [hidden email]
web: www.campaignmonitor.com


On 5 December 2013 15:14, Eric Jain <[hidden email]> wrote:
On Wed, Dec 4, 2013 at 8:09 PM, Mark Walkom <[hidden email]> wrote:
> It might be worth upgrading to 0.90.X, from what I have seen there was some
> major improvements in discovery.

As mentioned above, I am in fact using the latest (production) version
of both elasticsearch (0.90.7) and the elasticsearch-cloud-aws plugin
(1.16.0).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHte5%2BJ-vOhu%3DDpyt3F9d_tnQGiLisKjmL-NsXcRpDwDnV5Mhg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YFfqvMz39WOrnxLBEB8c3CVzhRdt-s5AUE1579rhHogg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: ec2 discovery stopped working!

Eric Jain
On Wed, Dec 4, 2013 at 8:16 PM, Mark Walkom <[hidden email]> wrote:
> Ah, the quote with "now using elasticsearch 0.19.7" threw me.

Yeah, I shouldn't have quoted my own typo again :-)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHte5%2BKn9a9vdL33v-vRU9jWbj5ZzyHma8%3DiYh53kQKLaJQ0RQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.