how to safely clean old documents (by date)

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

how to safely clean old documents (by date)

AALISHE
Hi,

I have ES  "0.20.3" .. with a single index replicated on 2 servers .. with 5 shards ..   size: 57.2gb / docs: 36060297

I have webpages(the docs) indexed since 2013 ...  so I want to delete everything and keep 1 year worth of documents

how can I do this safely on a production setup

am thinking of the following:

1- make a copy of the current index ... put it next to it (with a different name ofcourse)  ... (how do I accomplish this)
2- delete documents before May 2014 from the copied index
3- Rename the old index and leave /  Delete it

OR

1- make an empty index
2- pull documents after May2014 from the current index to the new one (how do I accomplish this)
3- Rename the old index and leave /  Delete it



appreciate your help guys...
thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f420f325-9f60-49d9-a9f6-d56528f32a99%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: how to safely clean old documents (by date)

dadoonet
Definitely the second option.
Use scan and scroll (search for reindex on the website). 

Instead of renaming, I would use aliases and switch the alias from old to new index.

Then close or remove the old index.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 3 mai 2015 à 22:24, AALISHE <[hidden email]> a écrit :

Hi,

I have ES  "0.20.3" .. with a single index replicated on 2 servers .. with 5 shards ..   size: 57.2gb / docs: 36060297

I have webpages(the docs) indexed since 2013 ...  so I want to delete everything and keep 1 year worth of documents

how can I do this safely on a production setup

am thinking of the following:

1- make a copy of the current index ... put it next to it (with a different name ofcourse)  ... (how do I accomplish this)
2- delete documents before May 2014 from the copied index
3- Rename the old index and leave /  Delete it

OR

1- make an empty index
2- pull documents after May2014 from the current index to the new one (how do I accomplish this)
3- Rename the old index and leave /  Delete it



appreciate your help guys...
thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f420f325-9f60-49d9-a9f6-d56528f32a99%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/28864A70-C04D-4845-AF22-7C5EDCB87FB2%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: how to safely clean old documents (by date)

AALISHE
Thanks David!

do you know how I perform step (2)  pull documents after May2014 from the current index to the new one



On Monday, May 4, 2015 at 12:15:22 AM UTC+3, David Pilato wrote:
Definitely the second option.
Use scan and scroll (search for reindex on the website). 

Instead of renaming, I would use aliases and switch the alias from old to new index.

Then close or remove the old index.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 3 mai 2015 à 22:24, AALISHE <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="d0nswYZl40kJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">aal...@...> a écrit :

Hi,

I have ES  "0.20.3" .. with a single index replicated on 2 servers .. with 5 shards ..   size: 57.2gb / docs: 36060297

I have webpages(the docs) indexed since 2013 ...  so I want to delete everything and keep 1 year worth of documents

how can I do this safely on a production setup

am thinking of the following:

1- make a copy of the current index ... put it next to it (with a different name ofcourse)  ... (how do I accomplish this)
2- delete documents before May 2014 from the copied index
3- Rename the old index and leave /  Delete it

OR

1- make an empty index
2- pull documents after May2014 from the current index to the new one (how do I accomplish this)
3- Rename the old index and leave /  Delete it



appreciate your help guys...
thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="d0nswYZl40kJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/f420f325-9f60-49d9-a9f6-d56528f32a99%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/f420f325-9f60-49d9-a9f6-d56528f32a99%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/f420f325-9f60-49d9-a9f6-d56528f32a99%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/f420f325-9f60-49d9-a9f6-d56528f32a99%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9c312789-2492-4bdf-a660-19d542df4977%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: how to safely clean old documents (by date)

dadoonet
Searching for reindex in docs would have directed you to http://www.elastic.co/guide/en/elasticsearch/guide/current/reindex.html

David

Le 3 mai 2015 à 23:36, AALISHE <[hidden email]> a écrit :

Thanks David!

do you know how I perform step (2)  pull documents after May2014 from the current index to the new one



On Monday, May 4, 2015 at 12:15:22 AM UTC+3, David Pilato wrote:
Definitely the second option.
Use scan and scroll (search for reindex on the website). 

Instead of renaming, I would use aliases and switch the alias from old to new index.

Then close or remove the old index.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 3 mai 2015 à 22:24, AALISHE <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="d0nswYZl40kJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">aal...@...> a écrit :

Hi,

I have ES  "0.20.3" .. with a single index replicated on 2 servers .. with 5 shards ..   size: 57.2gb / docs: 36060297

I have webpages(the docs) indexed since 2013 ...  so I want to delete everything and keep 1 year worth of documents

how can I do this safely on a production setup

am thinking of the following:

1- make a copy of the current index ... put it next to it (with a different name ofcourse)  ... (how do I accomplish this)
2- delete documents before May 2014 from the copied index
3- Rename the old index and leave /  Delete it

OR

1- make an empty index
2- pull documents after May2014 from the current index to the new one (how do I accomplish this)
3- Rename the old index and leave /  Delete it



appreciate your help guys...
thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="d0nswYZl40kJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/f420f325-9f60-49d9-a9f6-d56528f32a99%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/f420f325-9f60-49d9-a9f6-d56528f32a99%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/f420f325-9f60-49d9-a9f6-d56528f32a99%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/f420f325-9f60-49d9-a9f6-d56528f32a99%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9c312789-2492-4bdf-a660-19d542df4977%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/F64B5203-F4A7-4FB0-B34C-44F8D3249D8E%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: how to safely clean old documents (by date)

Mark Walkom-2
Just a side note, if you are using time based data then it makes a lot of sense to use time based indices - ie daily, weekly, monthly.

On 4 May 2015 at 08:06, David Pilato <[hidden email]> wrote:
Searching for reindex in docs would have directed you to http://www.elastic.co/guide/en/elasticsearch/guide/current/reindex.html

David

Le 3 mai 2015 à 23:36, AALISHE <[hidden email]> a écrit :

Thanks David!

do you know how I perform step (2)  pull documents after May2014 from the current index to the new one



On Monday, May 4, 2015 at 12:15:22 AM UTC+3, David Pilato wrote:
Definitely the second option.
Use scan and scroll (search for reindex on the website). 

Instead of renaming, I would use aliases and switch the alias from old to new index.

Then close or remove the old index.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 3 mai 2015 à 22:24, AALISHE <[hidden email]> a écrit :

Hi,

I have ES  "0.20.3" .. with a single index replicated on 2 servers .. with 5 shards ..   size: 57.2gb / docs: 36060297

I have webpages(the docs) indexed since 2013 ...  so I want to delete everything and keep 1 year worth of documents

how can I do this safely on a production setup

am thinking of the following:

1- make a copy of the current index ... put it next to it (with a different name ofcourse)  ... (how do I accomplish this)
2- delete documents before May 2014 from the copied index
3- Rename the old index and leave /  Delete it

OR

1- make an empty index
2- pull documents after May2014 from the current index to the new one (how do I accomplish this)
3- Rename the old index and leave /  Delete it



appreciate your help guys...
thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f420f325-9f60-49d9-a9f6-d56528f32a99%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9c312789-2492-4bdf-a660-19d542df4977%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/F64B5203-F4A7-4FB0-B34C-44F8D3249D8E%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_dBySiL84TBMqYxiu2odL52oBsB2s2cW4VqeocFxLmnA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.