ES backups without using snapshots?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

ES backups without using snapshots?

Mathew D
Hi there,

Any suggestions as to how I can create full ES backups without using snapshot functionality?

The reason I can't use snapshots is because they require a shared directory mounted on all nodes, but my 3-node cluster spans two data centres and I am not able to NFS mount over the WAN.  I'm also not permitted to backup to AWS/S3.  

As I have 2 replicas of each index, I'm leaning towards the idea of stopping one node and backing up that node's data directory but wondered if anyone could suggest a more elegant way.  For example, could I snapshot to a local directory on each node, then manually combine the contents into a single cohesive backup?

Regards,
Mat



--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f0b8a931-c423-4a37-a6df-5181bd4db309%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: ES backups without using snapshots?

Ivan Brusic
How many shards for each index? I am assuming that each node does not have all the data.

If you can stop indexing, you can just rsync the data to a local directory. Make sure you execute a flush and preferably an optimize in order to merge the segments on disk. The trick part is the manual combine you referred to.

BTW, 3 nodes/2 data centers? Sounds like a recipe for trouble. :)

Cheers,

Ivan

On Wed, Nov 19, 2014 at 7:41 PM, Mathew D <[hidden email]> wrote:
Hi there,

Any suggestions as to how I can create full ES backups without using snapshot functionality?

The reason I can't use snapshots is because they require a shared directory mounted on all nodes, but my 3-node cluster spans two data centres and I am not able to NFS mount over the WAN.  I'm also not permitted to backup to AWS/S3.  

As I have 2 replicas of each index, I'm leaning towards the idea of stopping one node and backing up that node's data directory but wondered if anyone could suggest a more elegant way.  For example, could I snapshot to a local directory on each node, then manually combine the contents into a single cohesive backup?

Regards,
Mat



--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f0b8a931-c423-4a37-a6df-5181bd4db309%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQB0FMyFOg4QvwwymTVUJzAsEvNBnkFA%2BObZbk4e_h_dsg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: ES backups without using snapshots?

Evan Tahler
Try https://github.com/taskrabbit/elasticsearch-dump.  You can save your data (& mappings) to JSON. 

On Wednesday, November 19, 2014 5:32:14 PM UTC-8, Ivan Brusic wrote:
How many shards for each index? I am assuming that each node does not have all the data.

If you can stop indexing, you can just rsync the data to a local directory. Make sure you execute a flush and preferably an optimize in order to merge the segments on disk. The trick part is the manual combine you referred to.

BTW, 3 nodes/2 data centers? Sounds like a recipe for trouble. :)

Cheers,

Ivan

On Wed, Nov 19, 2014 at 7:41 PM, Mathew D <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="IFptu7Ht1OAJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">mathew.d...@...> wrote:
Hi there,

Any suggestions as to how I can create full ES backups without using snapshot functionality?

The reason I can't use snapshots is because they require a shared directory mounted on all nodes, but my 3-node cluster spans two data centres and I am not able to NFS mount over the WAN.  I'm also not permitted to backup to AWS/S3.  

As I have 2 replicas of each index, I'm leaning towards the idea of stopping one node and backing up that node's data directory but wondered if anyone could suggest a more elegant way.  For example, could I snapshot to a local directory on each node, then manually combine the contents into a single cohesive backup?

Regards,
Mat



--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="IFptu7Ht1OAJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/f0b8a931-c423-4a37-a6df-5181bd4db309%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/f0b8a931-c423-4a37-a6df-5181bd4db309%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/f0b8a931-c423-4a37-a6df-5181bd4db309%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/f0b8a931-c423-4a37-a6df-5181bd4db309%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cc37b018-ceea-4079-b799-ccd8d61b3a70%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: ES backups without using snapshots?

Mathew D
In reply to this post by Ivan Brusic
Hi Ivan,

Thanks for the quick response.  We've got 5 shards per index, so with 2 replicas each node should in theory have a full set of data.  I was hoping that taking the node out of service by stopping it would avoid disruption as a result of pausing indexing, but I couldn't find any documentation to confirm if such an operation would leave the data files in a consistent state that could reliably be used for restore.  

Evan's suggestion of elasticdump looks like the closest to what I'm after, although unfortunately I don't have node.js/npm installed (and being an enterprise could be tricky to get installed).

NB I hear your concerns re cluster design.  Incorporating the remote node was chosen to minimise data loss following a data centre failure, however because of the risk of split brain, the node actually functions more of a warm DR than any sort of HA...

Regards,
Mat



On Thursday, November 20, 2014 2:32:14 PM UTC+13, Ivan Brusic wrote:
How many shards for each index? I am assuming that each node does not have all the data.

If you can stop indexing, you can just rsync the data to a local directory. Make sure you execute a flush and preferably an optimize in order to merge the segments on disk. The trick part is the manual combine you referred to.

BTW, 3 nodes/2 data centers? Sounds like a recipe for trouble. :)

Cheers,

Ivan

On Wed, Nov 19, 2014 at 7:41 PM, Mathew D <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="IFptu7Ht1OAJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">mathew.d...@...> wrote:
Hi there,

Any suggestions as to how I can create full ES backups without using snapshot functionality?

The reason I can't use snapshots is because they require a shared directory mounted on all nodes, but my 3-node cluster spans two data centres and I am not able to NFS mount over the WAN.  I'm also not permitted to backup to AWS/S3.  

As I have 2 replicas of each index, I'm leaning towards the idea of stopping one node and backing up that node's data directory but wondered if anyone could suggest a more elegant way.  For example, could I snapshot to a local directory on each node, then manually combine the contents into a single cohesive backup?

Regards,
Mat



--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="IFptu7Ht1OAJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/f0b8a931-c423-4a37-a6df-5181bd4db309%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/f0b8a931-c423-4a37-a6df-5181bd4db309%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/f0b8a931-c423-4a37-a6df-5181bd4db309%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/f0b8a931-c423-4a37-a6df-5181bd4db309%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7615a20f-7c90-43e4-b22b-052686cf543b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: ES backups without using snapshots?

Ivan Brusic
I have never used plugins, but there is also Jorg's tool: https://github.com/jprante/elasticsearch-knapsack

-- 
Ivan

On Wed, Nov 19, 2014 at 11:27 PM, Mathew D <[hidden email]> wrote:
Hi Ivan,

Thanks for the quick response.  We've got 5 shards per index, so with 2 replicas each node should in theory have a full set of data.  I was hoping that taking the node out of service by stopping it would avoid disruption as a result of pausing indexing, but I couldn't find any documentation to confirm if such an operation would leave the data files in a consistent state that could reliably be used for restore.  

Evan's suggestion of elasticdump looks like the closest to what I'm after, although unfortunately I don't have node.js/npm installed (and being an enterprise could be tricky to get installed).

NB I hear your concerns re cluster design.  Incorporating the remote node was chosen to minimise data loss following a data centre failure, however because of the risk of split brain, the node actually functions more of a warm DR than any sort of HA...

Regards,
Mat



On Thursday, November 20, 2014 2:32:14 PM UTC+13, Ivan Brusic wrote:
How many shards for each index? I am assuming that each node does not have all the data.

If you can stop indexing, you can just rsync the data to a local directory. Make sure you execute a flush and preferably an optimize in order to merge the segments on disk. The trick part is the manual combine you referred to.

BTW, 3 nodes/2 data centers? Sounds like a recipe for trouble. :)

Cheers,

Ivan

On Wed, Nov 19, 2014 at 7:41 PM, Mathew D <[hidden email]> wrote:
Hi there,

Any suggestions as to how I can create full ES backups without using snapshot functionality?

The reason I can't use snapshots is because they require a shared directory mounted on all nodes, but my 3-node cluster spans two data centres and I am not able to NFS mount over the WAN.  I'm also not permitted to backup to AWS/S3.  

As I have 2 replicas of each index, I'm leaning towards the idea of stopping one node and backing up that node's data directory but wondered if anyone could suggest a more elegant way.  For example, could I snapshot to a local directory on each node, then manually combine the contents into a single cohesive backup?

Regards,
Mat



--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f0b8a931-c423-4a37-a6df-5181bd4db309%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7615a20f-7c90-43e4-b22b-052686cf543b%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAfzN%2BbpvL94TbYMHNr0L4x%2BjEA0D6NrM_Hyj8NjUEHmA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.