What is the best practice for periodic snapshotting with awc-cloud+s3

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

What is the best practice for periodic snapshotting with awc-cloud+s3

Pradeep Reddy
I want to backup the data every 15-30 min. I will be storing the snapshots in S3.

DELETE old and then PUT new snapshot many not be the best practice as you may end up with nothing if something goes wrong.

Using timestamp for snapshot names may be one option, but how to delete old snapshots then?
Does S3 life management cycle help to delete old snapshots?

Looking forward to get some opinions on this.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0dd81d83-5066-4652-9703-dfce63b46993%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: What is the best practice for periodic snapshotting with awc-cloud+s3

vineeth mohan-2
Hi , 

Use this.
The snapshots are incremental , so it should fit your purpose perfectly.

Thanks
             Vineeth

On Fri, Nov 7, 2014 at 3:22 PM, Pradeep Reddy <[hidden email]> wrote:
I want to backup the data every 15-30 min. I will be storing the snapshots in S3.

DELETE old and then PUT new snapshot many not be the best practice as you may end up with nothing if something goes wrong.

Using timestamp for snapshot names may be one option, but how to delete old snapshots then?
Does S3 life management cycle help to delete old snapshots?

Looking forward to get some opinions on this.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0dd81d83-5066-4652-9703-dfce63b46993%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DLWQxgG9TqzttB%2B2Xi7NFusTO498MHwmiB2LquXLvqgw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: What is the best practice for periodic snapshotting with awc-cloud+s3

Pradeep Reddy
Hi Vineeth,

Thanks for the reply.
I am aware of how to create and delete snapshots using cloud-aws.

What I wanted to know was how should the work flow of periodic snapshot be?especially how to deal with old snapshots? having too many old snapshots- will this impact something?

On Friday, November 7, 2014 8:16:05 PM UTC+5:30, vineeth mohan wrote:
Hi , 

There is a s3 repository plugin - <a href="https://github.com/elasticsearch/elasticsearch-cloud-aws#s3-repository" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch-cloud-aws%23s3-repository\46sa\75D\46sntz\0751\46usg\75AFQjCNHjcJhSb5PTBbg43MY7Agi7V2pd4A';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch-cloud-aws%23s3-repository\46sa\75D\46sntz\0751\46usg\75AFQjCNHjcJhSb5PTBbg43MY7Agi7V2pd4A';return true;">https://github.com/elasticsearch/elasticsearch-cloud-aws#s3-repository
Use this.
The snapshots are incremental , so it should fit your purpose perfectly.

Thanks
             Vineeth

On Fri, Nov 7, 2014 at 3:22 PM, Pradeep Reddy <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="6unJmYTEXTAJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">pradeepreddy...@gmail.com> wrote:
I want to backup the data every 15-30 min. I will be storing the snapshots in S3.

DELETE old and then PUT new snapshot many not be the best practice as you may end up with nothing if something goes wrong.

Using timestamp for snapshot names may be one option, but how to delete old snapshots then?
Does S3 life management cycle help to delete old snapshots?

Looking forward to get some opinions on this.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="6unJmYTEXTAJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/0dd81d83-5066-4652-9703-dfce63b46993%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/0dd81d83-5066-4652-9703-dfce63b46993%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/0dd81d83-5066-4652-9703-dfce63b46993%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/0dd81d83-5066-4652-9703-dfce63b46993%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5e096f13-ad95-41c3-800d-fe5356ffde79%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: What is the best practice for periodic snapshotting with awc-cloud+s3

Sally Ahn
I am also interested in this topic.
We were snapshotting our cluster of two nodes every 2 hours (invoked via a cron job) to an S3 repository (we were running ES 1.2.2 with cloud-aws-plugin version 2.2.0, then we upgraded to ES 1.4.0 with cloud-aws-plugin 2.4.0 but are still seeing issues described below).
I've been seeing an increase in the time it takes to complete a snapshot with each subsequent snapshot. 
I see a thread where someone else was seeing the same thing, but that thread seems to have died.
In my case, snapshots have gone from taking ~5 minutes to taking about an hour, even between snapshots where data does not seem to have changed. 

For example, you can see below a list of the snapshots stored in my S3 repo. Each snapshot is named with a timestamp of when my cron job invoked the snapshot process. The S3 timestamp on the left shows the completion time of that snapshot, and it's clear that it's steadily increasing:

2014-09-30 10:05       686   s3://<bucketname>/snapshot-2014.09.30-10:00:01
2014-09-30 12:05       686   s3://<bucketname>/snapshot-2014.09.30-12:00:01
2014-09-30 14:05       736   s3://<bucketname>/snapshot-2014.09.30-14:00:01
2014-09-30 16:05       736   s3://<bucketname>/snapshot-2014.09.30-16:00:01
...
2014-11-08 00:52      1488   s3://<bucketname>/snapshot-2014.11.08-00:00:01
2014-11-08 02:54      1488   s3://<bucketname>/snapshot-2014.11.08-02:00:01
...
2014-11-08 14:54      1488   s3://<bucketname>/snapshot-2014.11.08-14:00:01
2014-11-08 16:53      1488   s3://<bucketname>/snapshot-2014.11.08-16:00:01
...
2014-11-11 07:00      1638   s3://<bucketname>/snapshot-2014.11.11-06:00:01
2014-11-11 08:58      1638   s3://<bucketname>/snapshot-2014.11.11-08:00:01
2014-11-11 10:58      1638   s3://<bucketname>/snapshot-2014.11.11-10:00:01
2014-11-11 12:59      1638   s3://<bucketname>/snapshot-2014.11.11-12:00:01
2014-11-11 15:00      1638   s3://<bucketname>/snapshot-2014.11.11-14:00:01
2014-11-11 17:00      1638   s3://<bucketname>/snapshot-2014.11.11-16:00:01

I suspected that this gradual increase was related to the accumulation of old snapshots after I tested the following:
1. I created a brand new cluster with the same hardware specs in the same datacenter and restored a snapshot of the problematic cluster taken few days back (i.e. not the latest snapshot). 
2. I then backed up that restored data to a new empty bucket in the same S3 region, and that was very fast...a minute or less. 
3. I then restored a later snapshot of the problematic cluster to the test cluster and tried backing it up again to the new bucket, and that also took about a minute or less.

However, when I tried deleting the repository full of old snapshots from the problematic cluster and registering a brand new empty bucket, I found that my first snapshot to the new repository was also hanging indefinitely. I finally had to kill my snapshot curl command. There were no errors in the logs (the snapshot logger is very terse...wondering if anyone knows how to increase the verbosity for it).

So my theory seems to have been debunked, and I am again at a loss. I am wondering whether the hanging snapshot is related to the slow snapshots I was seeing before I deleted that old repository. I have seen several issues in GitHub regarding hanging snapshots (#5958#7980) and have tried using the elasticsearch-snapshot-cleanup utility on my cluster both before and after I upgraded from version 1.2.2 to 1.4.0 (I thought upgrading to 1.4.0 which included snapshot improvements may fix my issues, but it did not), and the script is not finding any running snapshots:

[2014-11-13 05:37:45,451][INFO ][org.elasticsearch.node   ] [Golden Archer] started
[2014-11-13 05:37:45,451][INFO ][org.elasticsearch.org.motovs.elasticsearch.snapshots.AbortedSnapshotCleaner] No snapshots found
[2014-11-13 05:37:45,452][INFO ][org.elasticsearch.node   ] [Golden Archer] stopping ...

Curling to _snapshot/REPO/_status also returns no ongoing snapshots:

curl -XGET 'http://<hostname>:9200/_snapshot/s3_backup_repo/_status?pretty=true'
{
  "snapshots" : [ ]
}

I may try bouncing ES on each node to see if that kills whatever process is causing my requests to the snapshot module to hang (requests to other modules like _cluster/health returns fine; cluster health is green, and load is low for both nodes - 0.00, 0.06).

I would really appreciate some help/guidance on how to debug/fix this issue and general recommendations on how to best achieve periodic snapshots. For example, cleaning up old snapshots seems rather difficult since we have to specify the snapshot name, which we would obtain by making a request to the snapshot module, which seems to hang often.

Thanks,
Sally


On Monday, November 10, 2014 12:27:10 AM UTC-8, Pradeep Reddy wrote:
Hi Vineeth,

Thanks for the reply.
I am aware of how to create and delete snapshots using cloud-aws.

What I wanted to know was how should the work flow of periodic snapshot be?especially how to deal with old snapshots? having too many old snapshots- will this impact something?

On Friday, November 7, 2014 8:16:05 PM UTC+5:30, vineeth mohan wrote:
Hi , 

There is a s3 repository plugin - <a href="https://github.com/elasticsearch/elasticsearch-cloud-aws#s3-repository" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch-cloud-aws%23s3-repository\46sa\75D\46sntz\0751\46usg\75AFQjCNHjcJhSb5PTBbg43MY7Agi7V2pd4A';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch-cloud-aws%23s3-repository\46sa\75D\46sntz\0751\46usg\75AFQjCNHjcJhSb5PTBbg43MY7Agi7V2pd4A';return true;">https://github.com/elasticsearch/elasticsearch-cloud-aws#s3-repository
Use this.
The snapshots are incremental , so it should fit your purpose perfectly.

Thanks
             Vineeth

On Fri, Nov 7, 2014 at 3:22 PM, Pradeep Reddy <[hidden email]> wrote:
I want to backup the data every 15-30 min. I will be storing the snapshots in S3.

DELETE old and then PUT new snapshot many not be the best practice as you may end up with nothing if something goes wrong.

Using timestamp for snapshot names may be one option, but how to delete old snapshots then?
Does S3 life management cycle help to delete old snapshots?

Looking forward to get some opinions on this.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/0dd81d83-5066-4652-9703-dfce63b46993%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/0dd81d83-5066-4652-9703-dfce63b46993%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/0dd81d83-5066-4652-9703-dfce63b46993%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/0dd81d83-5066-4652-9703-dfce63b46993%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cd4a64b1-9276-44a6-8ff3-688759d2be57%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: What is the best practice for periodic snapshotting with awc-cloud+s3

Igor Motov-3
Having too many snapshots is problematic. Each snapshot is done in incremental manner, so in order to figure out what changes and what is available all snapshots in the repository needs to be scanned, which takes time as number of snapshots growing. I would recommend pruning old snapshots as time goes by or starting snapshots into a new bucket/directory if you really need to maintain 2 hour resolution for 2 months old snapshots. The get command can sometimes hang because it's throttled by the on-going snapshot. 


On Wednesday, November 12, 2014 9:02:33 PM UTC-10, Sally Ahn wrote:
I am also interested in this topic.
We were snapshotting our cluster of two nodes every 2 hours (invoked via a cron job) to an S3 repository (we were running ES 1.2.2 with cloud-aws-plugin version 2.2.0, then we upgraded to ES 1.4.0 with cloud-aws-plugin 2.4.0 but are still seeing issues described below).
I've been seeing an increase in the time it takes to complete a snapshot with each subsequent snapshot. 
I see a <a href="https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/snapshot/elasticsearch/bCKenCVFf2o/TFK-Es0wxSwJ" target="_blank" onmousedown="this.href='https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/snapshot/elasticsearch/bCKenCVFf2o/TFK-Es0wxSwJ';return true;" onclick="this.href='https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/snapshot/elasticsearch/bCKenCVFf2o/TFK-Es0wxSwJ';return true;">thread where someone else was seeing the same thing, but that thread seems to have died.
In my case, snapshots have gone from taking ~5 minutes to taking about an hour, even between snapshots where data does not seem to have changed. 

For example, you can see below a list of the snapshots stored in my S3 repo. Each snapshot is named with a timestamp of when my cron job invoked the snapshot process. The S3 timestamp on the left shows the completion time of that snapshot, and it's clear that it's steadily increasing:

2014-09-30 10:05       686   s3://<bucketname>/snapshot-2014.09.30-10:00:01
2014-09-30 12:05       686   s3://<bucketname>/snapshot-2014.09.30-12:00:01
2014-09-30 14:05       736   s3://<bucketname>/snapshot-2014.09.30-14:00:01
2014-09-30 16:05       736   s3://<bucketname>/snapshot-2014.09.30-16:00:01
...
2014-11-08 00:52      1488   s3://<bucketname>/snapshot-2014.11.08-00:00:01
2014-11-08 02:54      1488   s3://<bucketname>/snapshot-2014.11.08-02:00:01
...
2014-11-08 14:54      1488   s3://<bucketname>/snapshot-2014.11.08-14:00:01
2014-11-08 16:53      1488   s3://<bucketname>/snapshot-2014.11.08-16:00:01
...
2014-11-11 07:00      1638   s3://<bucketname>/snapshot-2014.11.11-06:00:01
2014-11-11 08:58      1638   s3://<bucketname>/snapshot-2014.11.11-08:00:01
2014-11-11 10:58      1638   s3://<bucketname>/snapshot-2014.11.11-10:00:01
2014-11-11 12:59      1638   s3://<bucketname>/snapshot-2014.11.11-12:00:01
2014-11-11 15:00      1638   s3://<bucketname>/snapshot-2014.11.11-14:00:01
2014-11-11 17:00      1638   s3://<bucketname>/snapshot-2014.11.11-16:00:01

I suspected that this gradual increase was related to the accumulation of old snapshots after I tested the following:
1. I created a brand new cluster with the same hardware specs in the same datacenter and restored a snapshot of the problematic cluster taken few days back (i.e. not the latest snapshot). 
2. I then backed up that restored data to a new empty bucket in the same S3 region, and that was very fast...a minute or less. 
3. I then restored a later snapshot of the problematic cluster to the test cluster and tried backing it up again to the new bucket, and that also took about a minute or less.

However, when I tried deleting the repository full of old snapshots from the problematic cluster and registering a brand new empty bucket, I found that my first snapshot to the new repository was also hanging indefinitely. I finally had to kill my snapshot curl command. There were no errors in the logs (the snapshot logger is very terse...wondering if anyone knows how to increase the verbosity for it).

So my theory seems to have been debunked, and I am again at a loss. I am wondering whether the hanging snapshot is related to the slow snapshots I was seeing before I deleted that old repository. I have seen several issues in GitHub regarding hanging snapshots (<a href="https://github.com/elasticsearch/elasticsearch/issues/5958" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fissues%2F5958\46sa\75D\46sntz\0751\46usg\75AFQjCNFBLSnI8NC45smQZJ8C5T2EBUAeGg';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fissues%2F5958\46sa\75D\46sntz\0751\46usg\75AFQjCNFBLSnI8NC45smQZJ8C5T2EBUAeGg';return true;">#5958, <a href="https://github.com/elasticsearch/elasticsearch/issues/7980" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fissues%2F7980\46sa\75D\46sntz\0751\46usg\75AFQjCNE5LrkcqLGXVm0snH7AStrVh4GrqA';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fissues%2F7980\46sa\75D\46sntz\0751\46usg\75AFQjCNE5LrkcqLGXVm0snH7AStrVh4GrqA';return true;">#7980) and have tried using the <a href="https://github.com/imotov/elasticsearch-snapshot-cleanup" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Fimotov%2Felasticsearch-snapshot-cleanup\46sa\75D\46sntz\0751\46usg\75AFQjCNGQZg1-S16q5e4X4wLxtQPFfs7Jdw';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Fimotov%2Felasticsearch-snapshot-cleanup\46sa\75D\46sntz\0751\46usg\75AFQjCNGQZg1-S16q5e4X4wLxtQPFfs7Jdw';return true;">elasticsearch-snapshot-cleanup utility on my cluster both before and after I upgraded from version 1.2.2 to 1.4.0 (I thought upgrading to 1.4.0 which included snapshot improvements may fix my issues, but it did not), and the script is not finding any running snapshots:

[2014-11-13 05:37:45,451][INFO ][org.elasticsearch.node   ] [Golden Archer] started
[2014-11-13 05:37:45,451][INFO ][org.elasticsearch.org.motovs.elasticsearch.snapshots.AbortedSnapshotCleaner] No snapshots found
[2014-11-13 05:37:45,452][INFO ][org.elasticsearch.node   ] [Golden Archer] stopping ...

Curling to _snapshot/REPO/_status also returns no ongoing snapshots:

curl -XGET 'http://<hostname>:9200/_snapshot/s3_backup_repo/_status?pretty=true'
{
  "snapshots" : [ ]
}

I may try bouncing ES on each node to see if that kills whatever process is causing my requests to the snapshot module to hang (requests to other modules like _cluster/health returns fine; cluster health is green, and load is low for both nodes - 0.00, 0.06).

I would really appreciate some help/guidance on how to debug/fix this issue and general recommendations on how to best achieve periodic snapshots. For example, cleaning up old snapshots seems rather difficult since we have to specify the snapshot name, which we would obtain by making a request to the snapshot module, which seems to hang often.

Thanks,
Sally


On Monday, November 10, 2014 12:27:10 AM UTC-8, Pradeep Reddy wrote:
Hi Vineeth,

Thanks for the reply.
I am aware of how to create and delete snapshots using cloud-aws.

What I wanted to know was how should the work flow of periodic snapshot be?especially how to deal with old snapshots? having too many old snapshots- will this impact something?

On Friday, November 7, 2014 8:16:05 PM UTC+5:30, vineeth mohan wrote:
Hi , 

There is a s3 repository plugin - <a href="https://github.com/elasticsearch/elasticsearch-cloud-aws#s3-repository" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch-cloud-aws%23s3-repository\46sa\75D\46sntz\0751\46usg\75AFQjCNHjcJhSb5PTBbg43MY7Agi7V2pd4A';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch-cloud-aws%23s3-repository\46sa\75D\46sntz\0751\46usg\75AFQjCNHjcJhSb5PTBbg43MY7Agi7V2pd4A';return true;">https://github.com/elasticsearch/elasticsearch-cloud-aws#s3-repository
Use this.
The snapshots are incremental , so it should fit your purpose perfectly.

Thanks
             Vineeth

On Fri, Nov 7, 2014 at 3:22 PM, Pradeep Reddy <[hidden email]> wrote:
I want to backup the data every 15-30 min. I will be storing the snapshots in S3.

DELETE old and then PUT new snapshot many not be the best practice as you may end up with nothing if something goes wrong.

Using timestamp for snapshot names may be one option, but how to delete old snapshots then?
Does S3 life management cycle help to delete old snapshots?

Looking forward to get some opinions on this.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/0dd81d83-5066-4652-9703-dfce63b46993%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/0dd81d83-5066-4652-9703-dfce63b46993%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/0dd81d83-5066-4652-9703-dfce63b46993%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/0dd81d83-5066-4652-9703-dfce63b46993%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6a69424e-637b-4bd3-bce6-68ca6c624981%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: What is the best practice for periodic snapshotting with awc-cloud+s3

Sally Ahn
Yes, I am now seeing the snapshots complete in about 2 minutes after switching to a new, empty bucket.
I'm not sure why the initial request to snapshot to the empty repo was hanging because the snapshot did in fact complete in about 2 minutes, according to the S3 timestamp.
Time to automate deletion of old snapshots. :)
Thanks for the response!

On Thursday, November 13, 2014 9:35:20 PM UTC-8, Igor Motov wrote:
Having too many snapshots is problematic. Each snapshot is done in incremental manner, so in order to figure out what changes and what is available all snapshots in the repository needs to be scanned, which takes time as number of snapshots growing. I would recommend pruning old snapshots as time goes by or starting snapshots into a new bucket/directory if you really need to maintain 2 hour resolution for 2 months old snapshots. The get command can sometimes hang because it's throttled by the on-going snapshot. 


On Wednesday, November 12, 2014 9:02:33 PM UTC-10, Sally Ahn wrote:
I am also interested in this topic.
We were snapshotting our cluster of two nodes every 2 hours (invoked via a cron job) to an S3 repository (we were running ES 1.2.2 with cloud-aws-plugin version 2.2.0, then we upgraded to ES 1.4.0 with cloud-aws-plugin 2.4.0 but are still seeing issues described below).
I've been seeing an increase in the time it takes to complete a snapshot with each subsequent snapshot. 
I see a <a href="https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/snapshot/elasticsearch/bCKenCVFf2o/TFK-Es0wxSwJ" target="_blank" onmousedown="this.href='https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/snapshot/elasticsearch/bCKenCVFf2o/TFK-Es0wxSwJ';return true;" onclick="this.href='https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/snapshot/elasticsearch/bCKenCVFf2o/TFK-Es0wxSwJ';return true;">thread where someone else was seeing the same thing, but that thread seems to have died.
In my case, snapshots have gone from taking ~5 minutes to taking about an hour, even between snapshots where data does not seem to have changed. 

For example, you can see below a list of the snapshots stored in my S3 repo. Each snapshot is named with a timestamp of when my cron job invoked the snapshot process. The S3 timestamp on the left shows the completion time of that snapshot, and it's clear that it's steadily increasing:

2014-09-30 10:05       686   s3://<bucketname>/snapshot-2014.09.30-10:00:01
2014-09-30 12:05       686   s3://<bucketname>/snapshot-2014.09.30-12:00:01
2014-09-30 14:05       736   s3://<bucketname>/snapshot-2014.09.30-14:00:01
2014-09-30 16:05       736   s3://<bucketname>/snapshot-2014.09.30-16:00:01
...
2014-11-08 00:52      1488   s3://<bucketname>/snapshot-2014.11.08-00:00:01
2014-11-08 02:54      1488   s3://<bucketname>/snapshot-2014.11.08-02:00:01
...
2014-11-08 14:54      1488   s3://<bucketname>/snapshot-2014.11.08-14:00:01
2014-11-08 16:53      1488   s3://<bucketname>/snapshot-2014.11.08-16:00:01
...
2014-11-11 07:00      1638   s3://<bucketname>/snapshot-2014.11.11-06:00:01
2014-11-11 08:58      1638   s3://<bucketname>/snapshot-2014.11.11-08:00:01
2014-11-11 10:58      1638   s3://<bucketname>/snapshot-2014.11.11-10:00:01
2014-11-11 12:59      1638   s3://<bucketname>/snapshot-2014.11.11-12:00:01
2014-11-11 15:00      1638   s3://<bucketname>/snapshot-2014.11.11-14:00:01
2014-11-11 17:00      1638   s3://<bucketname>/snapshot-2014.11.11-16:00:01

I suspected that this gradual increase was related to the accumulation of old snapshots after I tested the following:
1. I created a brand new cluster with the same hardware specs in the same datacenter and restored a snapshot of the problematic cluster taken few days back (i.e. not the latest snapshot). 
2. I then backed up that restored data to a new empty bucket in the same S3 region, and that was very fast...a minute or less. 
3. I then restored a later snapshot of the problematic cluster to the test cluster and tried backing it up again to the new bucket, and that also took about a minute or less.

However, when I tried deleting the repository full of old snapshots from the problematic cluster and registering a brand new empty bucket, I found that my first snapshot to the new repository was also hanging indefinitely. I finally had to kill my snapshot curl command. There were no errors in the logs (the snapshot logger is very terse...wondering if anyone knows how to increase the verbosity for it).

So my theory seems to have been debunked, and I am again at a loss. I am wondering whether the hanging snapshot is related to the slow snapshots I was seeing before I deleted that old repository. I have seen several issues in GitHub regarding hanging snapshots (<a href="https://github.com/elasticsearch/elasticsearch/issues/5958" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fissues%2F5958\46sa\75D\46sntz\0751\46usg\75AFQjCNFBLSnI8NC45smQZJ8C5T2EBUAeGg';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fissues%2F5958\46sa\75D\46sntz\0751\46usg\75AFQjCNFBLSnI8NC45smQZJ8C5T2EBUAeGg';return true;">#5958, <a href="https://github.com/elasticsearch/elasticsearch/issues/7980" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fissues%2F7980\46sa\75D\46sntz\0751\46usg\75AFQjCNE5LrkcqLGXVm0snH7AStrVh4GrqA';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fissues%2F7980\46sa\75D\46sntz\0751\46usg\75AFQjCNE5LrkcqLGXVm0snH7AStrVh4GrqA';return true;">#7980) and have tried using the <a href="https://github.com/imotov/elasticsearch-snapshot-cleanup" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Fimotov%2Felasticsearch-snapshot-cleanup\46sa\75D\46sntz\0751\46usg\75AFQjCNGQZg1-S16q5e4X4wLxtQPFfs7Jdw';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Fimotov%2Felasticsearch-snapshot-cleanup\46sa\75D\46sntz\0751\46usg\75AFQjCNGQZg1-S16q5e4X4wLxtQPFfs7Jdw';return true;">elasticsearch-snapshot-cleanup utility on my cluster both before and after I upgraded from version 1.2.2 to 1.4.0 (I thought upgrading to 1.4.0 which included snapshot improvements may fix my issues, but it did not), and the script is not finding any running snapshots:

[2014-11-13 05:37:45,451][INFO ][org.elasticsearch.node   ] [Golden Archer] started
[2014-11-13 05:37:45,451][INFO ][org.elasticsearch.org.motovs.elasticsearch.snapshots.AbortedSnapshotCleaner] No snapshots found
[2014-11-13 05:37:45,452][INFO ][org.elasticsearch.node   ] [Golden Archer] stopping ...

Curling to _snapshot/REPO/_status also returns no ongoing snapshots:

curl -XGET 'http://<hostname>:9200/_snapshot/s3_backup_repo/_status?pretty=true'
{
  "snapshots" : [ ]
}

I may try bouncing ES on each node to see if that kills whatever process is causing my requests to the snapshot module to hang (requests to other modules like _cluster/health returns fine; cluster health is green, and load is low for both nodes - 0.00, 0.06).

I would really appreciate some help/guidance on how to debug/fix this issue and general recommendations on how to best achieve periodic snapshots. For example, cleaning up old snapshots seems rather difficult since we have to specify the snapshot name, which we would obtain by making a request to the snapshot module, which seems to hang often.

Thanks,
Sally


On Monday, November 10, 2014 12:27:10 AM UTC-8, Pradeep Reddy wrote:
Hi Vineeth,

Thanks for the reply.
I am aware of how to create and delete snapshots using cloud-aws.

What I wanted to know was how should the work flow of periodic snapshot be?especially how to deal with old snapshots? having too many old snapshots- will this impact something?

On Friday, November 7, 2014 8:16:05 PM UTC+5:30, vineeth mohan wrote:
Hi , 

There is a s3 repository plugin - <a href="https://github.com/elasticsearch/elasticsearch-cloud-aws#s3-repository" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch-cloud-aws%23s3-repository\46sa\75D\46sntz\0751\46usg\75AFQjCNHjcJhSb5PTBbg43MY7Agi7V2pd4A';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch-cloud-aws%23s3-repository\46sa\75D\46sntz\0751\46usg\75AFQjCNHjcJhSb5PTBbg43MY7Agi7V2pd4A';return true;">https://github.com/elasticsearch/elasticsearch-cloud-aws#s3-repository
Use this.
The snapshots are incremental , so it should fit your purpose perfectly.

Thanks
             Vineeth

On Fri, Nov 7, 2014 at 3:22 PM, Pradeep Reddy <[hidden email]> wrote:
I want to backup the data every 15-30 min. I will be storing the snapshots in S3.

DELETE old and then PUT new snapshot many not be the best practice as you may end up with nothing if something goes wrong.

Using timestamp for snapshot names may be one option, but how to delete old snapshots then?
Does S3 life management cycle help to delete old snapshots?

Looking forward to get some opinions on this.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/0dd81d83-5066-4652-9703-dfce63b46993%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/0dd81d83-5066-4652-9703-dfce63b46993%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/0dd81d83-5066-4652-9703-dfce63b46993%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/0dd81d83-5066-4652-9703-dfce63b46993%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ec5d32be-e189-41fe-8568-952388582535%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: What is the best practice for periodic snapshotting with awc-cloud+s3

João Costa
Hello,

Sorry for hijacking this thread, but I'm currently also pondering the best way to perform periodic snapshots in AWS.

My main concern is that we are using blue-green deployment with ephemeral storage on EC2, so if for some reason there is a problem with the cluster, we might lose a lot of data, therefore I would rather do frequent snapshots (for this reason, we are still using the deprecated S3 gateway).

The thing is, you claim that "Having too many snapshots is problematic" and that one should "prune old snapshots". Since snapshots are incremental, this will imply data loss, correct?
Also, is the problem related to the number of snapshots or the size of the data? Is there any way to merge old snapshots into one? Would this solve the problem?

Finally, if I create a cronjob to make automatic snapshots, can I run into problems if two instances attempt to create a snapshot with the same name at the same time?
Also, what's the best way to do a snapshot on shutdown? Should I put a script on init.d/rc.0 to run on shutdown before elasticsearch shuts down? I've seen cases where the EC2 instances have "not so grateful" shutdowns, so it would be wonder if there is a better way to do this on a cluster level (ie, if a node A notices that a node B is not responding, then it automatically makes a snapshot).

Sorry if some of these questions don't make much sense, I'm still quite new to elasticsearch and have not completly understood the new snapshot feature.

Em sexta-feira, 14 de novembro de 2014 08h19min42s UTC, Sally Ahn escreveu:
Yes, I am now seeing the snapshots complete in about 2 minutes after switching to a new, empty bucket.
I'm not sure why the initial request to snapshot to the empty repo was hanging because the snapshot did in fact complete in about 2 minutes, according to the S3 timestamp.
Time to automate deletion of old snapshots. :)
Thanks for the response!

On Thursday, November 13, 2014 9:35:20 PM UTC-8, Igor Motov wrote:
Having too many snapshots is problematic. Each snapshot is done in incremental manner, so in order to figure out what changes and what is available all snapshots in the repository needs to be scanned, which takes time as number of snapshots growing. I would recommend pruning old snapshots as time goes by or starting snapshots into a new bucket/directory if you really need to maintain 2 hour resolution for 2 months old snapshots. The get command can sometimes hang because it's throttled by the on-going snapshot. 


On Wednesday, November 12, 2014 9:02:33 PM UTC-10, Sally Ahn wrote:
I am also interested in this topic.
We were snapshotting our cluster of two nodes every 2 hours (invoked via a cron job) to an S3 repository (we were running ES 1.2.2 with cloud-aws-plugin version 2.2.0, then we upgraded to ES 1.4.0 with cloud-aws-plugin 2.4.0 but are still seeing issues described below).
I've been seeing an increase in the time it takes to complete a snapshot with each subsequent snapshot. 
I see a <a href="https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/snapshot/elasticsearch/bCKenCVFf2o/TFK-Es0wxSwJ" target="_blank" onmousedown="this.href='https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/snapshot/elasticsearch/bCKenCVFf2o/TFK-Es0wxSwJ';return true;" onclick="this.href='https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/snapshot/elasticsearch/bCKenCVFf2o/TFK-Es0wxSwJ';return true;">thread where someone else was seeing the same thing, but that thread seems to have died.
In my case, snapshots have gone from taking ~5 minutes to taking about an hour, even between snapshots where data does not seem to have changed. 

For example, you can see below a list of the snapshots stored in my S3 repo. Each snapshot is named with a timestamp of when my cron job invoked the snapshot process. The S3 timestamp on the left shows the completion time of that snapshot, and it's clear that it's steadily increasing:

2014-09-30 10:05       686   s3://<bucketname>/snapshot-2014.09.30-10:00:01
2014-09-30 12:05       686   s3://<bucketname>/snapshot-2014.09.30-12:00:01
2014-09-30 14:05       736   s3://<bucketname>/snapshot-2014.09.30-14:00:01
2014-09-30 16:05       736   s3://<bucketname>/snapshot-2014.09.30-16:00:01
...
2014-11-08 00:52      1488   s3://<bucketname>/snapshot-2014.11.08-00:00:01
2014-11-08 02:54      1488   s3://<bucketname>/snapshot-2014.11.08-02:00:01
...
2014-11-08 14:54      1488   s3://<bucketname>/snapshot-2014.11.08-14:00:01
2014-11-08 16:53      1488   s3://<bucketname>/snapshot-2014.11.08-16:00:01
...
2014-11-11 07:00      1638   s3://<bucketname>/snapshot-2014.11.11-06:00:01
2014-11-11 08:58      1638   s3://<bucketname>/snapshot-2014.11.11-08:00:01
2014-11-11 10:58      1638   s3://<bucketname>/snapshot-2014.11.11-10:00:01
2014-11-11 12:59      1638   s3://<bucketname>/snapshot-2014.11.11-12:00:01
2014-11-11 15:00      1638   s3://<bucketname>/snapshot-2014.11.11-14:00:01
2014-11-11 17:00      1638   s3://<bucketname>/snapshot-2014.11.11-16:00:01

I suspected that this gradual increase was related to the accumulation of old snapshots after I tested the following:
1. I created a brand new cluster with the same hardware specs in the same datacenter and restored a snapshot of the problematic cluster taken few days back (i.e. not the latest snapshot). 
2. I then backed up that restored data to a new empty bucket in the same S3 region, and that was very fast...a minute or less. 
3. I then restored a later snapshot of the problematic cluster to the test cluster and tried backing it up again to the new bucket, and that also took about a minute or less.

However, when I tried deleting the repository full of old snapshots from the problematic cluster and registering a brand new empty bucket, I found that my first snapshot to the new repository was also hanging indefinitely. I finally had to kill my snapshot curl command. There were no errors in the logs (the snapshot logger is very terse...wondering if anyone knows how to increase the verbosity for it).

So my theory seems to have been debunked, and I am again at a loss. I am wondering whether the hanging snapshot is related to the slow snapshots I was seeing before I deleted that old repository. I have seen several issues in GitHub regarding hanging snapshots (<a href="https://github.com/elasticsearch/elasticsearch/issues/5958" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fissues%2F5958\46sa\75D\46sntz\0751\46usg\75AFQjCNFBLSnI8NC45smQZJ8C5T2EBUAeGg';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fissues%2F5958\46sa\75D\46sntz\0751\46usg\75AFQjCNFBLSnI8NC45smQZJ8C5T2EBUAeGg';return true;">#5958, <a href="https://github.com/elasticsearch/elasticsearch/issues/7980" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fissues%2F7980\46sa\75D\46sntz\0751\46usg\75AFQjCNE5LrkcqLGXVm0snH7AStrVh4GrqA';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fissues%2F7980\46sa\75D\46sntz\0751\46usg\75AFQjCNE5LrkcqLGXVm0snH7AStrVh4GrqA';return true;">#7980) and have tried using the <a href="https://github.com/imotov/elasticsearch-snapshot-cleanup" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Fimotov%2Felasticsearch-snapshot-cleanup\46sa\75D\46sntz\0751\46usg\75AFQjCNGQZg1-S16q5e4X4wLxtQPFfs7Jdw';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Fimotov%2Felasticsearch-snapshot-cleanup\46sa\75D\46sntz\0751\46usg\75AFQjCNGQZg1-S16q5e4X4wLxtQPFfs7Jdw';return true;">elasticsearch-snapshot-cleanup utility on my cluster both before and after I upgraded from version 1.2.2 to 1.4.0 (I thought upgrading to 1.4.0 which included snapshot improvements may fix my issues, but it did not), and the script is not finding any running snapshots:

[2014-11-13 05:37:45,451][INFO ][org.elasticsearch.node   ] [Golden Archer] started
[2014-11-13 05:37:45,451][INFO ][org.elasticsearch.org.motovs.elasticsearch.snapshots.AbortedSnapshotCleaner] No snapshots found
[2014-11-13 05:37:45,452][INFO ][org.elasticsearch.node   ] [Golden Archer] stopping ...

Curling to _snapshot/REPO/_status also returns no ongoing snapshots:

curl -XGET 'http://<hostname>:9200/_snapshot/s3_backup_repo/_status?pretty=true'
{
  "snapshots" : [ ]
}

I may try bouncing ES on each node to see if that kills whatever process is causing my requests to the snapshot module to hang (requests to other modules like _cluster/health returns fine; cluster health is green, and load is low for both nodes - 0.00, 0.06).

I would really appreciate some help/guidance on how to debug/fix this issue and general recommendations on how to best achieve periodic snapshots. For example, cleaning up old snapshots seems rather difficult since we have to specify the snapshot name, which we would obtain by making a request to the snapshot module, which seems to hang often.

Thanks,
Sally


On Monday, November 10, 2014 12:27:10 AM UTC-8, Pradeep Reddy wrote:
Hi Vineeth,

Thanks for the reply.
I am aware of how to create and delete snapshots using cloud-aws.

What I wanted to know was how should the work flow of periodic snapshot be?especially how to deal with old snapshots? having too many old snapshots- will this impact something?

On Friday, November 7, 2014 8:16:05 PM UTC+5:30, vineeth mohan wrote:
Hi , 

There is a s3 repository plugin - <a href="https://github.com/elasticsearch/elasticsearch-cloud-aws#s3-repository" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch-cloud-aws%23s3-repository\46sa\75D\46sntz\0751\46usg\75AFQjCNHjcJhSb5PTBbg43MY7Agi7V2pd4A';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch-cloud-aws%23s3-repository\46sa\75D\46sntz\0751\46usg\75AFQjCNHjcJhSb5PTBbg43MY7Agi7V2pd4A';return true;">https://github.com/elasticsearch/elasticsearch-cloud-aws#s3-repository
Use this.
The snapshots are incremental , so it should fit your purpose perfectly.

Thanks
             Vineeth

On Fri, Nov 7, 2014 at 3:22 PM, Pradeep Reddy <[hidden email]> wrote:
I want to backup the data every 15-30 min. I will be storing the snapshots in S3.

DELETE old and then PUT new snapshot many not be the best practice as you may end up with nothing if something goes wrong.

Using timestamp for snapshot names may be one option, but how to delete old snapshots then?
Does S3 life management cycle help to delete old snapshots?

Looking forward to get some opinions on this.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/0dd81d83-5066-4652-9703-dfce63b46993%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/0dd81d83-5066-4652-9703-dfce63b46993%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/0dd81d83-5066-4652-9703-dfce63b46993%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/0dd81d83-5066-4652-9703-dfce63b46993%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/707bb3ca-51c4-4bd7-80ce-cabf5185425c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: What is the best practice for periodic snapshotting with awc-cloud+s3

Aaron Mildenstein
I will include my response to the original post:

Snapshots are at the segment level.  The more segments stored in the repository, the more segments will have to be compared to those in each successive snapshot.  With merges taking place continually in an active index, you may end up with a considerable number of "orphaned" segments stored in your repository, i.e. segments "backed up," but no longer directly correlating to a segment in your index.  Checking through these may be contributing to the increased amount of time between snapshots.  

Consider pruning older snapshots.  "Orphaned" segments will be deleted, and any segments still referenced will be preserved.

On Thursday, November 20, 2014 7:22:03 AM UTC-5, João Costa wrote:
Hello,

Sorry for hijacking this thread, but I'm currently also pondering the best way to perform periodic snapshots in AWS.

My main concern is that we are using blue-green deployment with ephemeral storage on EC2, so if for some reason there is a problem with the cluster, we might lose a lot of data, therefore I would rather do frequent snapshots (for this reason, we are still using the deprecated S3 gateway).

The thing is, you claim that "Having too many snapshots is problematic" and that one should "prune old snapshots". Since snapshots are incremental, this will imply data loss, correct?
Also, is the problem related to the number of snapshots or the size of the data? Is there any way to merge old snapshots into one? Would this solve the problem?

Finally, if I create a cronjob to make automatic snapshots, can I run into problems if two instances attempt to create a snapshot with the same name at the same time?
Also, what's the best way to do a snapshot on shutdown? Should I put a script on init.d/rc.0 to run on shutdown before elasticsearch shuts down? I've seen cases where the EC2 instances have "not so grateful" shutdowns, so it would be wonder if there is a better way to do this on a cluster level (ie, if a node A notices that a node B is not responding, then it automatically makes a snapshot).

Sorry if some of these questions don't make much sense, I'm still quite new to elasticsearch and have not completly understood the new snapshot feature.

Em sexta-feira, 14 de novembro de 2014 08h19min42s UTC, Sally Ahn escreveu:
Yes, I am now seeing the snapshots complete in about 2 minutes after switching to a new, empty bucket.
I'm not sure why the initial request to snapshot to the empty repo was hanging because the snapshot did in fact complete in about 2 minutes, according to the S3 timestamp.
Time to automate deletion of old snapshots. :)
Thanks for the response!

On Thursday, November 13, 2014 9:35:20 PM UTC-8, Igor Motov wrote:
Having too many snapshots is problematic. Each snapshot is done in incremental manner, so in order to figure out what changes and what is available all snapshots in the repository needs to be scanned, which takes time as number of snapshots growing. I would recommend pruning old snapshots as time goes by or starting snapshots into a new bucket/directory if you really need to maintain 2 hour resolution for 2 months old snapshots. The get command can sometimes hang because it's throttled by the on-going snapshot. 


On Wednesday, November 12, 2014 9:02:33 PM UTC-10, Sally Ahn wrote:
I am also interested in this topic.
We were snapshotting our cluster of two nodes every 2 hours (invoked via a cron job) to an S3 repository (we were running ES 1.2.2 with cloud-aws-plugin version 2.2.0, then we upgraded to ES 1.4.0 with cloud-aws-plugin 2.4.0 but are still seeing issues described below).
I've been seeing an increase in the time it takes to complete a snapshot with each subsequent snapshot. 
I see a <a href="https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/snapshot/elasticsearch/bCKenCVFf2o/TFK-Es0wxSwJ" target="_blank" onmousedown="this.href='https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/snapshot/elasticsearch/bCKenCVFf2o/TFK-Es0wxSwJ';return true;" onclick="this.href='https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/snapshot/elasticsearch/bCKenCVFf2o/TFK-Es0wxSwJ';return true;">thread where someone else was seeing the same thing, but that thread seems to have died.
In my case, snapshots have gone from taking ~5 minutes to taking about an hour, even between snapshots where data does not seem to have changed. 

For example, you can see below a list of the snapshots stored in my S3 repo. Each snapshot is named with a timestamp of when my cron job invoked the snapshot process. The S3 timestamp on the left shows the completion time of that snapshot, and it's clear that it's steadily increasing:

2014-09-30 10:05       686   s3://<bucketname>/snapshot-2014.09.30-10:00:01
2014-09-30 12:05       686   s3://<bucketname>/snapshot-2014.09.30-12:00:01
2014-09-30 14:05       736   s3://<bucketname>/snapshot-2014.09.30-14:00:01
2014-09-30 16:05       736   s3://<bucketname>/snapshot-2014.09.30-16:00:01
...
2014-11-08 00:52      1488   s3://<bucketname>/snapshot-2014.11.08-00:00:01
2014-11-08 02:54      1488   s3://<bucketname>/snapshot-2014.11.08-02:00:01
...
2014-11-08 14:54      1488   s3://<bucketname>/snapshot-2014.11.08-14:00:01
2014-11-08 16:53      1488   s3://<bucketname>/snapshot-2014.11.08-16:00:01
...
2014-11-11 07:00      1638   s3://<bucketname>/snapshot-2014.11.11-06:00:01
2014-11-11 08:58      1638   s3://<bucketname>/snapshot-2014.11.11-08:00:01
2014-11-11 10:58      1638   s3://<bucketname>/snapshot-2014.11.11-10:00:01
2014-11-11 12:59      1638   s3://<bucketname>/snapshot-2014.11.11-12:00:01
2014-11-11 15:00      1638   s3://<bucketname>/snapshot-2014.11.11-14:00:01
2014-11-11 17:00      1638   s3://<bucketname>/snapshot-2014.11.11-16:00:01

I suspected that this gradual increase was related to the accumulation of old snapshots after I tested the following:
1. I created a brand new cluster with the same hardware specs in the same datacenter and restored a snapshot of the problematic cluster taken few days back (i.e. not the latest snapshot). 
2. I then backed up that restored data to a new empty bucket in the same S3 region, and that was very fast...a minute or less. 
3. I then restored a later snapshot of the problematic cluster to the test cluster and tried backing it up again to the new bucket, and that also took about a minute or less.

However, when I tried deleting the repository full of old snapshots from the problematic cluster and registering a brand new empty bucket, I found that my first snapshot to the new repository was also hanging indefinitely. I finally had to kill my snapshot curl command. There were no errors in the logs (the snapshot logger is very terse...wondering if anyone knows how to increase the verbosity for it).

So my theory seems to have been debunked, and I am again at a loss. I am wondering whether the hanging snapshot is related to the slow snapshots I was seeing before I deleted that old repository. I have seen several issues in GitHub regarding hanging snapshots (<a href="https://github.com/elasticsearch/elasticsearch/issues/5958" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fissues%2F5958\46sa\75D\46sntz\0751\46usg\75AFQjCNFBLSnI8NC45smQZJ8C5T2EBUAeGg';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fissues%2F5958\46sa\75D\46sntz\0751\46usg\75AFQjCNFBLSnI8NC45smQZJ8C5T2EBUAeGg';return true;">#5958, <a href="https://github.com/elasticsearch/elasticsearch/issues/7980" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fissues%2F7980\46sa\75D\46sntz\0751\46usg\75AFQjCNE5LrkcqLGXVm0snH7AStrVh4GrqA';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fissues%2F7980\46sa\75D\46sntz\0751\46usg\75AFQjCNE5LrkcqLGXVm0snH7AStrVh4GrqA';return true;">#7980) and have tried using the <a href="https://github.com/imotov/elasticsearch-snapshot-cleanup" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Fimotov%2Felasticsearch-snapshot-cleanup\46sa\75D\46sntz\0751\46usg\75AFQjCNGQZg1-S16q5e4X4wLxtQPFfs7Jdw';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Fimotov%2Felasticsearch-snapshot-cleanup\46sa\75D\46sntz\0751\46usg\75AFQjCNGQZg1-S16q5e4X4wLxtQPFfs7Jdw';return true;">elasticsearch-snapshot-cleanup utility on my cluster both before and after I upgraded from version 1.2.2 to 1.4.0 (I thought upgrading to 1.4.0 which included snapshot improvements may fix my issues, but it did not), and the script is not finding any running snapshots:

[2014-11-13 05:37:45,451][INFO ][org.elasticsearch.node   ] [Golden Archer] started
[2014-11-13 05:37:45,451][INFO ][org.elasticsearch.org.motovs.elasticsearch.snapshots.AbortedSnapshotCleaner] No snapshots found
[2014-11-13 05:37:45,452][INFO ][org.elasticsearch.node   ] [Golden Archer] stopping ...

Curling to _snapshot/REPO/_status also returns no ongoing snapshots:

curl -XGET 'http://<hostname>:9200/_snapshot/s3_backup_repo/_status?pretty=true'
{
  "snapshots" : [ ]
}

I may try bouncing ES on each node to see if that kills whatever process is causing my requests to the snapshot module to hang (requests to other modules like _cluster/health returns fine; cluster health is green, and load is low for both nodes - 0.00, 0.06).

I would really appreciate some help/guidance on how to debug/fix this issue and general recommendations on how to best achieve periodic snapshots. For example, cleaning up old snapshots seems rather difficult since we have to specify the snapshot name, which we would obtain by making a request to the snapshot module, which seems to hang often.

Thanks,
Sally


On Monday, November 10, 2014 12:27:10 AM UTC-8, Pradeep Reddy wrote:
Hi Vineeth,

Thanks for the reply.
I am aware of how to create and delete snapshots using cloud-aws.

What I wanted to know was how should the work flow of periodic snapshot be?especially how to deal with old snapshots? having too many old snapshots- will this impact something?

On Friday, November 7, 2014 8:16:05 PM UTC+5:30, vineeth mohan wrote:
Hi , 

There is a s3 repository plugin - <a href="https://github.com/elasticsearch/elasticsearch-cloud-aws#s3-repository" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch-cloud-aws%23s3-repository\46sa\75D\46sntz\0751\46usg\75AFQjCNHjcJhSb5PTBbg43MY7Agi7V2pd4A';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch-cloud-aws%23s3-repository\46sa\75D\46sntz\0751\46usg\75AFQjCNHjcJhSb5PTBbg43MY7Agi7V2pd4A';return true;">https://github.com/elasticsearch/elasticsearch-cloud-aws#s3-repository
Use this.
The snapshots are incremental , so it should fit your purpose perfectly.

Thanks
             Vineeth

On Fri, Nov 7, 2014 at 3:22 PM, Pradeep Reddy <[hidden email]> wrote:
I want to backup the data every 15-30 min. I will be storing the snapshots in S3.

DELETE old and then PUT new snapshot many not be the best practice as you may end up with nothing if something goes wrong.

Using timestamp for snapshot names may be one option, but how to delete old snapshots then?
Does S3 life management cycle help to delete old snapshots?

Looking forward to get some opinions on this.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/0dd81d83-5066-4652-9703-dfce63b46993%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/0dd81d83-5066-4652-9703-dfce63b46993%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/0dd81d83-5066-4652-9703-dfce63b46993%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/0dd81d83-5066-4652-9703-dfce63b46993%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9786e098-d92f-497e-b4e2-f176094af9c9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.