How to clear data out of an index

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

How to clear data out of an index

danpolites
How do you clear the data out of an index without deleting that index?
We are using elasticsearch in a Grails application. During development
when we end up with rogue objects in the index, we can't remove them
because they no longer exist in the database. We added an admin
function that allows us to delete and recreate the index, but we don't
need to delete the index entirely. We only need to clear it.
Reply | Threaded
Open this post in threaded view
|

Re: How to clear data out of an index

Clinton Gormley-2
On Thu, 2011-07-28 at 12:51 -0700, danpolites wrote:
> How do you clear the data out of an index without deleting that index?
> We are using elasticsearch in a Grails application. During development
> when we end up with rogue objects in the index, we can't remove them
> because they no longer exist in the database. We added an admin
> function that allows us to delete and recreate the index, but we don't
> need to delete the index entirely. We only need to clear it.

Use the delete or delete_by_query API

clint

Reply | Threaded
Open this post in threaded view
|

Re: How to clear data out of an index

Vladimir Shkurin
In reply to this post by danpolites
Reply | Threaded
Open this post in threaded view
|

Re: How to clear data out of an index

danpolites
Yes, I have seen those many times ;). Maybe the way I am doing it is
the best way to do it. I was looking for a more convenient API call
that would just clear all of the documents in the index without
deleting the index so that I wouldn't have to setup the index again. I
suppose a 'delete by query' would work for that, but I'm hesitant to
use that with the warning that is provided on the page:

"Also, it is not recommended to delete “large chunks of the data in an
index”, many times, its better to simply reindex into a new index."

Why is this not recommended? We can't reindex to a new index because
we are creating a new index for every one of our customers and the
index names are unique to the customer. All of our objects have a
customer ID field that our search service uses choosing the correct
index. Again, this is idea of deleting and creating an index to clear
it out is mainly a convenience thing for development.Hopefully we
don't have rogue documents in production, but if we do, we need to be
able to quickly clean the indices because we use elasticsearch results
for all of our list views in the site and the show pages are MongoDB
calls. Our current solution of deleting the index and then creating
that index works OK, but it's really slow at reindexing with large
amounts of data. Maybe this is more of a best practice sort of
question. How should this be handled?

On Jul 28, 5:02 pm, Vladimir Shkurin <[hidden email]> wrote:
> Have you seen this? :)
>
> http://www.elasticsearch.org/guide/reference/api/delete-by-query.html
>
> http://www.elasticsearch.org/guide/reference/api/delete.html
Reply | Threaded
Open this post in threaded view
|

Re: How to clear data out of an index

dadoonet
I think that you should use one index per customer and create a main alias index on top.

When you need to clean a customer index, then drop it and create it again and add it to the alias.

It will take only some milliseconds. Deleting documents one by one (even with a query) will cost you so much and as you said, you want to remove "quickly" some customer's datas.


Hope this helps
David

Le 29 juil. 2011 à 03:16, danpolites <[hidden email]> a écrit :

> Yes, I have seen those many times ;). Maybe the way I am doing it is
> the best way to do it. I was looking for a more convenient API call
> that would just clear all of the documents in the index without
> deleting the index so that I wouldn't have to setup the index again. I
> suppose a 'delete by query' would work for that, but I'm hesitant to
> use that with the warning that is provided on the page:
>
> "Also, it is not recommended to delete “large chunks of the data in an
> index”, many times, its better to simply reindex into a new index."
>
> Why is this not recommended? We can't reindex to a new index because
> we are creating a new index for every one of our customers and the
> index names are unique to the customer. All of our objects have a
> customer ID field that our search service uses choosing the correct
> index. Again, this is idea of deleting and creating an index to clear
> it out is mainly a convenience thing for development.Hopefully we
> don't have rogue documents in production, but if we do, we need to be
> able to quickly clean the indices because we use elasticsearch results
> for all of our list views in the site and the show pages are MongoDB
> calls. Our current solution of deleting the index and then creating
> that index works OK, but it's really slow at reindexing with large
> amounts of data. Maybe this is more of a best practice sort of
> question. How should this be handled?
>
> On Jul 28, 5:02 pm, Vladimir Shkurin <[hidden email]> wrote:
>> Have you seen this? :)
>>
>> http://www.elasticsearch.org/guide/reference/api/delete-by-query.html
>>
>> http://www.elasticsearch.org/guide/reference/api/delete.html
Reply | Threaded
Open this post in threaded view
|

Re: How to clear data out of an index

kimchy
Administrator
In reply to this post by danpolites
The reason for the warning is the fact that when you delete a large part of your index, in practice, those documents don't get deleted but only marked as deleted in the lucene index. They will eventually get merged out of the index (to clean space and optimize the index).

So, if one ends up deleting a large portion of the index, it sometimes makes sense to actually reindex the data, as it will create a more optimized index. It really depends on the usecase of course.

On Fri, Jul 29, 2011 at 4:16 AM, danpolites <[hidden email]> wrote:
Yes, I have seen those many times ;). Maybe the way I am doing it is
the best way to do it. I was looking for a more convenient API call
that would just clear all of the documents in the index without
deleting the index so that I wouldn't have to setup the index again. I
suppose a 'delete by query' would work for that, but I'm hesitant to
use that with the warning that is provided on the page:

"Also, it is not recommended to delete “large chunks of the data in an
index”, many times, its better to simply reindex into a new index."

Why is this not recommended? We can't reindex to a new index because
we are creating a new index for every one of our customers and the
index names are unique to the customer. All of our objects have a
customer ID field that our search service uses choosing the correct
index. Again, this is idea of deleting and creating an index to clear
it out is mainly a convenience thing for development.Hopefully we
don't have rogue documents in production, but if we do, we need to be
able to quickly clean the indices because we use elasticsearch results
for all of our list views in the site and the show pages are MongoDB
calls. Our current solution of deleting the index and then creating
that index works OK, but it's really slow at reindexing with large
amounts of data. Maybe this is more of a best practice sort of
question. How should this be handled?

Reply | Threaded
Open this post in threaded view
|

Re: How to clear data out of an index

Michael Sokolov
In reply to this post by danpolites
This is an old thread, but I have a variation on the same question: I've seen various recommendations to drop and recreate indexes rather than deleting all documents.  I just want to know if there is anything in elasticsearch that maps to IndexWriter.deleteAll, since that is an efficient way to empty an index without having to recreate it.  It would be convenient if say deleteByQuery("*:*") were to cause that: does it?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e55f492c-b027-4a05-9546-dc2b7f7d468d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: How to clear data out of an index

Itamar Syn-Hershko
Due to the distributed nature of ES, the equivalent of this would be to delete the index and create it again using  the same mapping.

--

Itamar Syn-Hershko
http://code972.com | @synhershko
Freelance Developer & Consultant


On Sun, Apr 6, 2014 at 4:00 AM, Michael Sokolov <[hidden email]> wrote:
This is an old thread, but I have a variation on the same question: I've seen various recommendations to drop and recreate indexes rather than deleting all documents.  I just want to know if there is anything in elasticsearch that maps to IndexWriter.deleteAll, since that is an efficient way to empty an index without having to recreate it.  It would be convenient if say deleteByQuery("*:*") were to cause that: does it?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e55f492c-b027-4a05-9546-dc2b7f7d468d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zt2T%2BpMOyn843Fgffa0LRK_%2BvfJwjkXtcY9V9pcszw-sg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.