Can't stop a snapshot running on my cluster

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Can't stop a snapshot running on my cluster

Andrew Vos
A few days ago I started a snapshot, but instead of using a shared network I used the local filesystem. Because my root partition only had 8gb (and this is where I stored the snapshots) the partition got filled up and three of my seven elasticsearch boxes crashed almost instantly.

I've since created a new cluster and let the data replicate over. The problem now is, I can't seem to cancel this snapshot!

Looking at the snapshot:
curl -XGET "localhost:9999/_snapshot/production_backup/_snapshot1"
{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2:9300]][cluster/snapshot/get]]; nested: SnapshotMissingException[[production_backup:_snapshot1] is missing]; nested: FileNotFoundException[/ebs/snapshot-backup/snapshot-_snapshot1 (No such file or directory)]; ","status":404}%   

Starting a new snapshot:
curl -XPUT "localhost:9999/_snapshot/production_backup/snapshot_1?wait_for_completion=false"
{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2:9300]][cluster/snapshot/create]]; nested: ConcurrentSnapshotExecutionException[[production_backup:snapshot_1] a snapshot is already running]; ","status":503}% 

This just never completes:

curl -XDELETE "localhost:9999/_snapshot/production_backup/snapshot_1"


Any idea how I can solve this?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/af485e13-dc74-4e88-b6db-e3a4d67fb00c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Can't stop a snapshot running on my cluster

Igor Motov-3
Which version of elsticsearch are you using? Can you send me the current cluster state?

On Saturday, May 24, 2014 10:17:43 AM UTC-4, Andrew Vos wrote:
A few days ago I started a snapshot, but instead of using a shared network I used the local filesystem. Because my root partition only had 8gb (and this is where I stored the snapshots) the partition got filled up and three of my seven elasticsearch boxes crashed almost instantly.

I've since created a new cluster and let the data replicate over. The problem now is, I can't seem to cancel this snapshot!

Looking at the snapshot:
curl -XGET "localhost:9999/_snapshot/production_backup/_snapshot1"
{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2:9300]][cluster/snapshot/get]]; nested: SnapshotMissingException[[production_backup:_snapshot1] is missing]; nested: FileNotFoundException[/ebs/snapshot-backup/snapshot-_snapshot1 (No such file or directory)]; ","status":404}%   

Starting a new snapshot:
curl -XPUT "localhost:9999/_snapshot/production_backup/snapshot_1?wait_for_completion=false"
{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2:9300]][cluster/snapshot/create]]; nested: ConcurrentSnapshotExecutionException[[production_backup:snapshot_1] a snapshot is already running]; ","status":503}% 

This just never completes:

curl -XDELETE "localhost:9999/_snapshot/production_backup/snapshot_1"


Any idea how I can solve this?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Can't stop a snapshot running on my cluster

Andrew Vos
1.0.0. What do you mean by state exactly?


On Sat, May 24, 2014 at 6:33 PM, Igor Motov <[hidden email]> wrote:
Which version of elsticsearch are you using? Can you send me the current cluster state?

On Saturday, May 24, 2014 10:17:43 AM UTC-4, Andrew Vos wrote:
A few days ago I started a snapshot, but instead of using a shared network I used the local filesystem. Because my root partition only had 8gb (and this is where I stored the snapshots) the partition got filled up and three of my seven elasticsearch boxes crashed almost instantly.

I've since created a new cluster and let the data replicate over. The problem now is, I can't seem to cancel this snapshot!

Looking at the snapshot:
curl -XGET "localhost:9999/_snapshot/production_backup/_snapshot1"
{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2:9300]][cluster/snapshot/get]]; nested: SnapshotMissingException[[production_backup:_snapshot1] is missing]; nested: FileNotFoundException[/ebs/snapshot-backup/snapshot-_snapshot1 (No such file or directory)]; ","status":404}%   

Starting a new snapshot:
curl -XPUT "localhost:9999/_snapshot/production_backup/snapshot_1?wait_for_completion=false"
{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2:9300]][cluster/snapshot/create]]; nested: ConcurrentSnapshotExecutionException[[production_backup:snapshot_1] a snapshot is already running]; ","status":503}% 

This just never completes:

curl -XDELETE "localhost:9999/_snapshot/production_backup/snapshot_1"


Any idea how I can solve this?

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAE64QtHJ8FUQOvAzDN%3DXegbm%3DBQ%2BUix1R5akpwTHMsuciFqdCg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Can't stop a snapshot running on my cluster

Andrew Vos


On Sat, May 24, 2014 at 7:18 PM, Andrew Vos <[hidden email]> wrote:
1.0.0. What do you mean by state exactly?


On Sat, May 24, 2014 at 6:33 PM, Igor Motov <[hidden email]> wrote:
Which version of elsticsearch are you using? Can you send me the current cluster state?

On Saturday, May 24, 2014 10:17:43 AM UTC-4, Andrew Vos wrote:
A few days ago I started a snapshot, but instead of using a shared network I used the local filesystem. Because my root partition only had 8gb (and this is where I stored the snapshots) the partition got filled up and three of my seven elasticsearch boxes crashed almost instantly.

I've since created a new cluster and let the data replicate over. The problem now is, I can't seem to cancel this snapshot!

Looking at the snapshot:
curl -XGET "localhost:9999/_snapshot/production_backup/_snapshot1"
{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2:9300]][cluster/snapshot/get]]; nested: SnapshotMissingException[[production_backup:_snapshot1] is missing]; nested: FileNotFoundException[/ebs/snapshot-backup/snapshot-_snapshot1 (No such file or directory)]; ","status":404}%   

Starting a new snapshot:
curl -XPUT "localhost:9999/_snapshot/production_backup/snapshot_1?wait_for_completion=false"
{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2:9300]][cluster/snapshot/create]]; nested: ConcurrentSnapshotExecutionException[[production_backup:snapshot_1] a snapshot is already running]; ","status":503}% 

This just never completes:

curl -XDELETE "localhost:9999/_snapshot/production_backup/snapshot_1"


Any idea how I can solve this?

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAE64QtEo%2BPimLXv6whLS63hHzzV2rUnu3bxB2tBQTnNRKA3RLg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Can't stop a snapshot running on my cluster

Igor Motov-3
In reply to this post by Andrew Vos
I meant the output of the cluster state command:

curl -XGET 'http://localhost:9200/_cluster/state'

It might be large and will contain information about your cluster that you might not want to share publicly (index mappings). If this is the case, please feel free to send it to me by email.

Igor


On Saturday, May 24, 2014 2:18:14 PM UTC-4, Andrew Vos wrote:
1.0.0. What do you mean by state exactly?


On Sat, May 24, 2014 at 6:33 PM, Igor Motov <[hidden email]> wrote:
Which version of elsticsearch are you using? Can you send me the current cluster state?

On Saturday, May 24, 2014 10:17:43 AM UTC-4, Andrew Vos wrote:
A few days ago I started a snapshot, but instead of using a shared network I used the local filesystem. Because my root partition only had 8gb (and this is where I stored the snapshots) the partition got filled up and three of my seven elasticsearch boxes crashed almost instantly.

I've since created a new cluster and let the data replicate over. The problem now is, I can't seem to cancel this snapshot!

Looking at the snapshot:
curl -XGET "localhost:9999/_snapshot/production_backup/_snapshot1"
{"error":"RemoteTransportException[[Smuggler][inet[/<a href="http://172.17.0.2" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2F172.17.0.2\46sa\75D\46sntz\0751\46usg\75AFQjCNGZIPE-cm02u2S3ltb91sQEIXBX6w';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2F172.17.0.2\46sa\75D\46sntz\0751\46usg\75AFQjCNGZIPE-cm02u2S3ltb91sQEIXBX6w';return true;">172.17.0.2:9300]][cluster/snapshot/get]]; nested: SnapshotMissingException[[production_backup:_snapshot1] is missing]; nested: FileNotFoundException[/ebs/snapshot-backup/snapshot-_snapshot1 (No such file or directory)]; ","status":404}%   

Starting a new snapshot:
curl -XPUT "localhost:9999/_snapshot/production_backup/snapshot_1?wait_for_completion=false"
{"error":"RemoteTransportException[[Smuggler][inet[/<a href="http://172.17.0.2" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2F172.17.0.2\46sa\75D\46sntz\0751\46usg\75AFQjCNGZIPE-cm02u2S3ltb91sQEIXBX6w';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2F172.17.0.2\46sa\75D\46sntz\0751\46usg\75AFQjCNGZIPE-cm02u2S3ltb91sQEIXBX6w';return true;">172.17.0.2:9300]][cluster/snapshot/create]]; nested: ConcurrentSnapshotExecutionException[[production_backup:snapshot_1] a snapshot is already running]; ","status":503}% 

This just never completes:

curl -XDELETE "localhost:9999/_snapshot/production_backup/snapshot_1"


Any idea how I can solve this?

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit <a href="https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe" target="_blank" onmousedown="this.href='https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe';return true;" onclick="this.href='https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe';return true;">https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e292c3d2-b0bf-438a-986b-e32db3f2dd7a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Can't stop a snapshot running on my cluster

Igor Motov-3
In reply to this post by Andrew Vos
It was caused by this bug - https://github.com/elasticsearch/elasticsearch/issues/5958 The only recovery option right now is full cluster restart. 

On Saturday, May 24, 2014 2:30:06 PM UTC-4, Andrew Vos wrote:
Right ok here's the cluster state <a href="https://gist.github.com/AndrewVos/29de3c6735bbd7808a81" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgist.github.com%2FAndrewVos%2F29de3c6735bbd7808a81\46sa\75D\46sntz\0751\46usg\75AFQjCNGkdaz9qM7kUGxIxDtmOVfnqs0pSA';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgist.github.com%2FAndrewVos%2F29de3c6735bbd7808a81\46sa\75D\46sntz\0751\46usg\75AFQjCNGkdaz9qM7kUGxIxDtmOVfnqs0pSA';return true;">https://gist.github.com/AndrewVos/29de3c6735bbd7808a81


On Sat, May 24, 2014 at 7:18 PM, Andrew Vos <[hidden email]> wrote:
1.0.0. What do you mean by state exactly?


On Sat, May 24, 2014 at 6:33 PM, Igor Motov <[hidden email]> wrote:
Which version of elsticsearch are you using? Can you send me the current cluster state?

On Saturday, May 24, 2014 10:17:43 AM UTC-4, Andrew Vos wrote:
A few days ago I started a snapshot, but instead of using a shared network I used the local filesystem. Because my root partition only had 8gb (and this is where I stored the snapshots) the partition got filled up and three of my seven elasticsearch boxes crashed almost instantly.

I've since created a new cluster and let the data replicate over. The problem now is, I can't seem to cancel this snapshot!

Looking at the snapshot:
curl -XGET "localhost:9999/_snapshot/production_backup/_snapshot1"
{"error":"RemoteTransportException[[Smuggler][inet[/<a href="http://172.17.0.2" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2F172.17.0.2\46sa\75D\46sntz\0751\46usg\75AFQjCNGZIPE-cm02u2S3ltb91sQEIXBX6w';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2F172.17.0.2\46sa\75D\46sntz\0751\46usg\75AFQjCNGZIPE-cm02u2S3ltb91sQEIXBX6w';return true;">172.17.0.2:9300]][cluster/snapshot/get]]; nested: SnapshotMissingException[[production_backup:_snapshot1] is missing]; nested: FileNotFoundException[/ebs/snapshot-backup/snapshot-_snapshot1 (No such file or directory)]; ","status":404}%   

Starting a new snapshot:
curl -XPUT "localhost:9999/_snapshot/production_backup/snapshot_1?wait_for_completion=false"
{"error":"RemoteTransportException[[Smuggler][inet[/<a href="http://172.17.0.2" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2F172.17.0.2\46sa\75D\46sntz\0751\46usg\75AFQjCNGZIPE-cm02u2S3ltb91sQEIXBX6w';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2F172.17.0.2\46sa\75D\46sntz\0751\46usg\75AFQjCNGZIPE-cm02u2S3ltb91sQEIXBX6w';return true;">172.17.0.2:9300]][cluster/snapshot/create]]; nested: ConcurrentSnapshotExecutionException[[production_backup:snapshot_1] a snapshot is already running]; ","status":503}% 

This just never completes:

curl -XDELETE "localhost:9999/_snapshot/production_backup/snapshot_1"


Any idea how I can solve this?

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit <a href="https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe" target="_blank" onmousedown="this.href='https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe';return true;" onclick="this.href='https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe';return true;">https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9573402c-d7fe-419d-866f-cff1196fcfc4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Can't stop a snapshot running on my cluster

Andrew Vos
Ok. While you're here, one other question I would like answered:

I have 10 nodes in a cluster. I want to break out three nodes into a different cluster as a kind of backup to test out this full cluster restart. Would it be safe to just block the other three nodes from connecting to the main cluster? Would they form their own?


On Sat, May 24, 2014 at 7:35 PM, Igor Motov <[hidden email]> wrote:
It was caused by this bug - https://github.com/elasticsearch/elasticsearch/issues/5958 The only recovery option right now is full cluster restart. 

On Saturday, May 24, 2014 2:30:06 PM UTC-4, Andrew Vos wrote:
Right ok here's the cluster state https://gist.github.com/AndrewVos/29de3c6735bbd7808a81


On Sat, May 24, 2014 at 7:18 PM, Andrew Vos <[hidden email]> wrote:
1.0.0. What do you mean by state exactly?


On Sat, May 24, 2014 at 6:33 PM, Igor Motov <[hidden email]> wrote:
Which version of elsticsearch are you using? Can you send me the current cluster state?

On Saturday, May 24, 2014 10:17:43 AM UTC-4, Andrew Vos wrote:
A few days ago I started a snapshot, but instead of using a shared network I used the local filesystem. Because my root partition only had 8gb (and this is where I stored the snapshots) the partition got filled up and three of my seven elasticsearch boxes crashed almost instantly.

I've since created a new cluster and let the data replicate over. The problem now is, I can't seem to cancel this snapshot!

Looking at the snapshot:
curl -XGET "localhost:9999/_snapshot/production_backup/_snapshot1"
{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2:9300]][cluster/snapshot/get]]; nested: SnapshotMissingException[[production_backup:_snapshot1] is missing]; nested: FileNotFoundException[/ebs/snapshot-backup/snapshot-_snapshot1 (No such file or directory)]; ","status":404}%   

Starting a new snapshot:
curl -XPUT "localhost:9999/_snapshot/production_backup/snapshot_1?wait_for_completion=false"
{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2:9300]][cluster/snapshot/create]]; nested: ConcurrentSnapshotExecutionException[[production_backup:snapshot_1] a snapshot is already running]; ","status":503}% 

This just never completes:

curl -XDELETE "localhost:9999/_snapshot/production_backup/snapshot_1"


Any idea how I can solve this?

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9573402c-d7fe-419d-866f-cff1196fcfc4%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAE64QtGZ2SKt3SLOXSS1D52jgpHXHNM2E0g8B9qtNWgraSWUuw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Can't stop a snapshot running on my cluster

Igor Motov-3
If your cluster is setup correctly (with proper value set for discovery.zen.minimum_master_nodes) they shouldn't. But if you are running without discovery.zen.minimum_master_nodes set, they might indeed form a new cluster. Obviously some shards might end up in one cluster and not in the other and if you are indexing while this is happening you will lose some data. I would say it's pretty..... extreme way to test full cluster restart. 

On Saturday, May 24, 2014 2:38:44 PM UTC-4, Andrew Vos wrote:
Ok. While you're here, one other question I would like answered:

I have 10 nodes in a cluster. I want to break out three nodes into a different cluster as a kind of backup to test out this full cluster restart. Would it be safe to just block the other three nodes from connecting to the main cluster? Would they form their own?


On Sat, May 24, 2014 at 7:35 PM, Igor Motov <[hidden email]> wrote:
It was caused by this bug - <a href="https://github.com/elasticsearch/elasticsearch/issues/5958" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fissues%2F5958\46sa\75D\46sntz\0751\46usg\75AFQjCNFBLSnI8NC45smQZJ8C5T2EBUAeGg';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fissues%2F5958\46sa\75D\46sntz\0751\46usg\75AFQjCNFBLSnI8NC45smQZJ8C5T2EBUAeGg';return true;">https://github.com/elasticsearch/elasticsearch/issues/5958 The only recovery option right now is full cluster restart. 

On Saturday, May 24, 2014 2:30:06 PM UTC-4, Andrew Vos wrote:
Right ok here's the cluster state <a href="https://gist.github.com/AndrewVos/29de3c6735bbd7808a81" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgist.github.com%2FAndrewVos%2F29de3c6735bbd7808a81\46sa\75D\46sntz\0751\46usg\75AFQjCNGkdaz9qM7kUGxIxDtmOVfnqs0pSA';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgist.github.com%2FAndrewVos%2F29de3c6735bbd7808a81\46sa\75D\46sntz\0751\46usg\75AFQjCNGkdaz9qM7kUGxIxDtmOVfnqs0pSA';return true;">https://gist.github.com/AndrewVos/29de3c6735bbd7808a81


On Sat, May 24, 2014 at 7:18 PM, Andrew Vos <[hidden email]> wrote:
1.0.0. What do you mean by state exactly?


On Sat, May 24, 2014 at 6:33 PM, Igor Motov <[hidden email]> wrote:
Which version of elsticsearch are you using? Can you send me the current cluster state?

On Saturday, May 24, 2014 10:17:43 AM UTC-4, Andrew Vos wrote:
A few days ago I started a snapshot, but instead of using a shared network I used the local filesystem. Because my root partition only had 8gb (and this is where I stored the snapshots) the partition got filled up and three of my seven elasticsearch boxes crashed almost instantly.

I've since created a new cluster and let the data replicate over. The problem now is, I can't seem to cancel this snapshot!

Looking at the snapshot:
curl -XGET "localhost:9999/_snapshot/production_backup/_snapshot1"
{"error":"RemoteTransportException[[Smuggler][inet[/<a href="http://172.17.0.2" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2F172.17.0.2\46sa\75D\46sntz\0751\46usg\75AFQjCNGZIPE-cm02u2S3ltb91sQEIXBX6w';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2F172.17.0.2\46sa\75D\46sntz\0751\46usg\75AFQjCNGZIPE-cm02u2S3ltb91sQEIXBX6w';return true;">172.17.0.2:9300]][cluster/snapshot/get]]; nested: SnapshotMissingException[[production_backup:_snapshot1] is missing]; nested: FileNotFoundException[/ebs/snapshot-backup/snapshot-_snapshot1 (No such file or directory)]; ","status":404}%   

Starting a new snapshot:
curl -XPUT "localhost:9999/_snapshot/production_backup/snapshot_1?wait_for_completion=false"
{"error":"RemoteTransportException[[Smuggler][inet[/<a href="http://172.17.0.2" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2F172.17.0.2\46sa\75D\46sntz\0751\46usg\75AFQjCNGZIPE-cm02u2S3ltb91sQEIXBX6w';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2F172.17.0.2\46sa\75D\46sntz\0751\46usg\75AFQjCNGZIPE-cm02u2S3ltb91sQEIXBX6w';return true;">172.17.0.2:9300]][cluster/snapshot/create]]; nested: ConcurrentSnapshotExecutionException[[production_backup:snapshot_1] a snapshot is already running]; ","status":503}% 

This just never completes:

curl -XDELETE "localhost:9999/_snapshot/production_backup/snapshot_1"


Any idea how I can solve this?

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit <a href="https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe" target="_blank" onmousedown="this.href='https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe';return true;" onclick="this.href='https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe';return true;">https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.


--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit <a href="https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe" target="_blank" onmousedown="this.href='https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe';return true;" onclick="this.href='https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe';return true;">https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/9573402c-d7fe-419d-866f-cff1196fcfc4%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/9573402c-d7fe-419d-866f-cff1196fcfc4%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/9573402c-d7fe-419d-866f-cff1196fcfc4%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/9573402c-d7fe-419d-866f-cff1196fcfc4%40googlegroups.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ae997156-db75-4a2a-8a29-1435a50bb0f1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Can't stop a snapshot running on my cluster

Andrew Vos
Well it's the only way I can do it without downtime. Unless of course by "full cluster restart" you mean restarting one node at a time?


On Sat, May 24, 2014 at 7:51 PM, Igor Motov <[hidden email]> wrote:
If your cluster is setup correctly (with proper value set for discovery.zen.minimum_master_nodes) they shouldn't. But if you are running without discovery.zen.minimum_master_nodes set, they might indeed form a new cluster. Obviously some shards might end up in one cluster and not in the other and if you are indexing while this is happening you will lose some data. I would say it's pretty..... extreme way to test full cluster restart. 


On Saturday, May 24, 2014 2:38:44 PM UTC-4, Andrew Vos wrote:
Ok. While you're here, one other question I would like answered:

I have 10 nodes in a cluster. I want to break out three nodes into a different cluster as a kind of backup to test out this full cluster restart. Would it be safe to just block the other three nodes from connecting to the main cluster? Would they form their own?


On Sat, May 24, 2014 at 7:35 PM, Igor Motov <[hidden email]> wrote:
It was caused by this bug - https://github.com/elasticsearch/elasticsearch/issues/5958 The only recovery option right now is full cluster restart. 

On Saturday, May 24, 2014 2:30:06 PM UTC-4, Andrew Vos wrote:
Right ok here's the cluster state https://gist.github.com/AndrewVos/29de3c6735bbd7808a81


On Sat, May 24, 2014 at 7:18 PM, Andrew Vos <[hidden email]> wrote:
1.0.0. What do you mean by state exactly?


On Sat, May 24, 2014 at 6:33 PM, Igor Motov <[hidden email]> wrote:
Which version of elsticsearch are you using? Can you send me the current cluster state?

On Saturday, May 24, 2014 10:17:43 AM UTC-4, Andrew Vos wrote:
A few days ago I started a snapshot, but instead of using a shared network I used the local filesystem. Because my root partition only had 8gb (and this is where I stored the snapshots) the partition got filled up and three of my seven elasticsearch boxes crashed almost instantly.

I've since created a new cluster and let the data replicate over. The problem now is, I can't seem to cancel this snapshot!

Looking at the snapshot:
curl -XGET "localhost:9999/_snapshot/production_backup/_snapshot1"
{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2:9300]][cluster/snapshot/get]]; nested: SnapshotMissingException[[production_backup:_snapshot1] is missing]; nested: FileNotFoundException[/ebs/snapshot-backup/snapshot-_snapshot1 (No such file or directory)]; ","status":404}%   

Starting a new snapshot:
curl -XPUT "localhost:9999/_snapshot/production_backup/snapshot_1?wait_for_completion=false"
{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2:9300]][cluster/snapshot/create]]; nested: ConcurrentSnapshotExecutionException[[production_backup:snapshot_1] a snapshot is already running]; ","status":503}% 

This just never completes:

curl -XDELETE "localhost:9999/_snapshot/production_backup/snapshot_1"


Any idea how I can solve this?

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9573402c-d7fe-419d-866f-cff1196fcfc4%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ae997156-db75-4a2a-8a29-1435a50bb0f1%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAE64QtGmnSrz7AM-Ad_gkg9UrGCfOPPaMuB57UGOUTP6RUnLSQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Can't stop a snapshot running on my cluster

Igor Motov-3
Yes, by "full cluster restart" I meant shutting down all nodes and then starting them up again, which means downtime. However, after thinking about the issue over the long weekend, I wrote a simple utility that cleans up snapshots without need to restart the cluster - https://github.com/imotov/elasticsearch-snapshot-cleanup 

On Saturday, May 24, 2014 2:53:27 PM UTC-4, Andrew Vos wrote:
Well it's the only way I can do it without downtime. Unless of course by "full cluster restart" you mean restarting one node at a time?


On Sat, May 24, 2014 at 7:51 PM, Igor Motov <[hidden email]> wrote:
If your cluster is setup correctly (with proper value set for discovery.zen.minimum_master_nodes) they shouldn't. But if you are running without discovery.zen.minimum_master_nodes set, they might indeed form a new cluster. Obviously some shards might end up in one cluster and not in the other and if you are indexing while this is happening you will lose some data. I would say it's pretty..... extreme way to test full cluster restart. 


On Saturday, May 24, 2014 2:38:44 PM UTC-4, Andrew Vos wrote:
Ok. While you're here, one other question I would like answered:

I have 10 nodes in a cluster. I want to break out three nodes into a different cluster as a kind of backup to test out this full cluster restart. Would it be safe to just block the other three nodes from connecting to the main cluster? Would they form their own?


On Sat, May 24, 2014 at 7:35 PM, Igor Motov <[hidden email]> wrote:
It was caused by this bug - <a href="https://github.com/elasticsearch/elasticsearch/issues/5958" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fissues%2F5958\46sa\75D\46sntz\0751\46usg\75AFQjCNFBLSnI8NC45smQZJ8C5T2EBUAeGg';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fissues%2F5958\46sa\75D\46sntz\0751\46usg\75AFQjCNFBLSnI8NC45smQZJ8C5T2EBUAeGg';return true;">https://github.com/elasticsearch/elasticsearch/issues/5958 The only recovery option right now is full cluster restart. 

On Saturday, May 24, 2014 2:30:06 PM UTC-4, Andrew Vos wrote:
Right ok here's the cluster state <a href="https://gist.github.com/AndrewVos/29de3c6735bbd7808a81" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgist.github.com%2FAndrewVos%2F29de3c6735bbd7808a81\46sa\75D\46sntz\0751\46usg\75AFQjCNGkdaz9qM7kUGxIxDtmOVfnqs0pSA';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgist.github.com%2FAndrewVos%2F29de3c6735bbd7808a81\46sa\75D\46sntz\0751\46usg\75AFQjCNGkdaz9qM7kUGxIxDtmOVfnqs0pSA';return true;">https://gist.github.com/AndrewVos/29de3c6735bbd7808a81


On Sat, May 24, 2014 at 7:18 PM, Andrew Vos <[hidden email]> wrote:
1.0.0. What do you mean by state exactly?


On Sat, May 24, 2014 at 6:33 PM, Igor Motov <[hidden email]> wrote:
Which version of elsticsearch are you using? Can you send me the current cluster state?

On Saturday, May 24, 2014 10:17:43 AM UTC-4, Andrew Vos wrote:
A few days ago I started a snapshot, but instead of using a shared network I used the local filesystem. Because my root partition only had 8gb (and this is where I stored the snapshots) the partition got filled up and three of my seven elasticsearch boxes crashed almost instantly.

I've since created a new cluster and let the data replicate over. The problem now is, I can't seem to cancel this snapshot!

Looking at the snapshot:
curl -XGET "localhost:9999/_snapshot/production_backup/_snapshot1"
{"error":"RemoteTransportException[[Smuggler][inet[/<a href="http://172.17.0.2" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2F172.17.0.2\46sa\75D\46sntz\0751\46usg\75AFQjCNGZIPE-cm02u2S3ltb91sQEIXBX6w';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2F172.17.0.2\46sa\75D\46sntz\0751\46usg\75AFQjCNGZIPE-cm02u2S3ltb91sQEIXBX6w';return true;">172.17.0.2:9300]][cluster/snapshot/get]]; nested: SnapshotMissingException[[production_backup:_snapshot1] is missing]; nested: FileNotFoundException[/ebs/snapshot-backup/snapshot-_snapshot1 (No such file or directory)]; ","status":404}%   

Starting a new snapshot:
curl -XPUT "localhost:9999/_snapshot/production_backup/snapshot_1?wait_for_completion=false"
{"error":"RemoteTransportException[[Smuggler][inet[/<a href="http://172.17.0.2" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2F172.17.0.2\46sa\75D\46sntz\0751\46usg\75AFQjCNGZIPE-cm02u2S3ltb91sQEIXBX6w';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2F172.17.0.2\46sa\75D\46sntz\0751\46usg\75AFQjCNGZIPE-cm02u2S3ltb91sQEIXBX6w';return true;">172.17.0.2:9300]][cluster/snapshot/create]]; nested: ConcurrentSnapshotExecutionException[[production_backup:snapshot_1] a snapshot is already running]; ","status":503}% 

This just never completes:

curl -XDELETE "localhost:9999/_snapshot/production_backup/snapshot_1"


Any idea how I can solve this?

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit <a href="https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe" target="_blank" onmousedown="this.href='https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe';return true;" onclick="this.href='https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe';return true;">https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.


--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit <a href="https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe" target="_blank" onmousedown="this.href='https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe';return true;" onclick="this.href='https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe';return true;">https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/9573402c-d7fe-419d-866f-cff1196fcfc4%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/9573402c-d7fe-419d-866f-cff1196fcfc4%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/9573402c-d7fe-419d-866f-cff1196fcfc4%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/9573402c-d7fe-419d-866f-cff1196fcfc4%40googlegroups.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit <a href="https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe" target="_blank" onmousedown="this.href='https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe';return true;" onclick="this.href='https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe';return true;">https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/ae997156-db75-4a2a-8a29-1435a50bb0f1%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/ae997156-db75-4a2a-8a29-1435a50bb0f1%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/ae997156-db75-4a2a-8a29-1435a50bb0f1%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/ae997156-db75-4a2a-8a29-1435a50bb0f1%40googlegroups.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/97b8d2f5-078f-4a24-b5a1-b97c9b61b87f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.