How to reindex ElasticSearch quickly?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

How to reindex ElasticSearch quickly?

Dmitry Babitsky
I have an ElasticSearch index with around 200M documents, total index size of 90Gb.

I changed mapping, so I would like ElasticSearch to re-index all the documents.

I wrote a script that creates a new index (with the new mapping), then goes over all the documents in the old index and puts then into the new one.

It seems to work, but the problem is that it works extremely slowly.
It started with 300 documents / minute two days ago, and now the speed is 150 documents/minute.

The script runs on a machine within the same network the elastic search machines in.

With such speed it will require a month for the re-index to finish.


Does anybody know about some faster technique to re-index an elastic search index?

Thanks in advance!!!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: How to reindex ElasticSearch quickly?

doug livesey
It could be worth looking at the bulk operations -- we rebuild an admittedly much smaller index by using the bulk API & loading 2000 documents in each operation.


On 9 June 2013 09:03, Dmitry Babitsky <[hidden email]> wrote:
I have an ElasticSearch index with around 200M documents, total index size of 90Gb.

I changed mapping, so I would like ElasticSearch to re-index all the documents.

I wrote a script that creates a new index (with the new mapping), then goes over all the documents in the old index and puts then into the new one.

It seems to work, but the problem is that it works extremely slowly.
It started with 300 documents / minute two days ago, and now the speed is 150 documents/minute.

The script runs on a machine within the same network the elastic search machines in.

With such speed it will require a month for the re-index to finish.


Does anybody know about some faster technique to re-index an elastic search index?

Thanks in advance!!!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: How to reindex ElasticSearch quickly?

vineeth mohan
https://github.com/karussell/elasticsearch-reindex

Thanks
           Vineeth

On Sun, Jun 9, 2013 at 1:43 PM, doug livesey <[hidden email]> wrote:
It could be worth looking at the bulk operations -- we rebuild an admittedly much smaller index by using the bulk API & loading 2000 documents in each operation.


On 9 June 2013 09:03, Dmitry Babitsky <[hidden email]> wrote:
I have an ElasticSearch index with around 200M documents, total index size of 90Gb.

I changed mapping, so I would like ElasticSearch to re-index all the documents.

I wrote a script that creates a new index (with the new mapping), then goes over all the documents in the old index and puts then into the new one.

It seems to work, but the problem is that it works extremely slowly.
It started with 300 documents / minute two days ago, and now the speed is 150 documents/minute.

The script runs on a machine within the same network the elastic search machines in.

With such speed it will require a month for the re-index to finish.


Does anybody know about some faster technique to re-index an elastic search index?

Thanks in advance!!!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: How to reindex ElasticSearch quickly?

Dmitry Babitsky
Two questions about the reindex plug-in:
1. Is it possible to reindex an existing index into a new one, so it would run offline?
2. I could not understand from the reindex-plug-in readme what is the right way to run it, so it will reindex the entire index, without any counters...

Thanks,
Dmitry.

On Sunday, June 9, 2013 11:51:30 AM UTC+3, Vineeth Mohan wrote:
https://github.com/karussell/elasticsearch-reindex

Thanks
           Vineeth

On Sun, Jun 9, 2013 at 1:43 PM, doug livesey <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="luHZnxx6AKUJ">bio...@...> wrote:
It could be worth looking at the bulk operations -- we rebuild an admittedly much smaller index by using the bulk API & loading 2000 documents in each operation.


On 9 June 2013 09:03, Dmitry Babitsky <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="luHZnxx6AKUJ">dim...@...> wrote:
I have an ElasticSearch index with around 200M documents, total index size of 90Gb.

I changed mapping, so I would like ElasticSearch to re-index all the documents.

I wrote a script that creates a new index (with the new mapping), then goes over all the documents in the old index and puts then into the new one.

It seems to work, but the problem is that it works extremely slowly.
It started with 300 documents / minute two days ago, and now the speed is 150 documents/minute.

The script runs on a machine within the same network the elastic search machines in.

With such speed it will require a month for the re-index to finish.


Does anybody know about some faster technique to re-index an elastic search index?

Thanks in advance!!!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="luHZnxx6AKUJ">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="luHZnxx6AKUJ">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: How to reindex ElasticSearch quickly?

Dmitry Babitsky
In reply to this post by doug livesey
The idea of bulk indexing sounds very good!
One question - how do you perform the bulk read?

Thanks a lot!!!

On Sunday, June 9, 2013 11:13:27 AM UTC+3, doug livesey wrote:
It could be worth looking at the bulk operations -- we rebuild an admittedly much smaller index by using the bulk API & loading 2000 documents in each operation.


On 9 June 2013 09:03, Dmitry Babitsky <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="MaSyNSrW0IkJ">dim...@...> wrote:
I have an ElasticSearch index with around 200M documents, total index size of 90Gb.

I changed mapping, so I would like ElasticSearch to re-index all the documents.

I wrote a script that creates a new index (with the new mapping), then goes over all the documents in the old index and puts then into the new one.

It seems to work, but the problem is that it works extremely slowly.
It started with 300 documents / minute two days ago, and now the speed is 150 documents/minute.

The script runs on a machine within the same network the elastic search machines in.

With such speed it will require a month for the re-index to finish.


Does anybody know about some faster technique to re-index an elastic search index?

Thanks in advance!!!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="MaSyNSrW0IkJ">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: How to reindex ElasticSearch quickly?

doug livesey
IIRC, you can query for a bunch of documents, and they'll be returned (nested in the response) in an array. There must be limit and offset options to those queries.
Once you have have your array of documents, you can feed that to the bulk index API.


On 9 June 2013 10:40, Dmitry Babitsky <[hidden email]> wrote:
The idea of bulk indexing sounds very good!
One question - how do you perform the bulk read?

Thanks a lot!!!


On Sunday, June 9, 2013 11:13:27 AM UTC+3, doug livesey wrote:
It could be worth looking at the bulk operations -- we rebuild an admittedly much smaller index by using the bulk API & loading 2000 documents in each operation.


On 9 June 2013 09:03, Dmitry Babitsky <[hidden email]> wrote:
I have an ElasticSearch index with around 200M documents, total index size of 90Gb.

I changed mapping, so I would like ElasticSearch to re-index all the documents.

I wrote a script that creates a new index (with the new mapping), then goes over all the documents in the old index and puts then into the new one.

It seems to work, but the problem is that it works extremely slowly.
It started with 300 documents / minute two days ago, and now the speed is 150 documents/minute.

The script runs on a machine within the same network the elastic search machines in.

With such speed it will require a month for the re-index to finish.


Does anybody know about some faster technique to re-index an elastic search index?

Thanks in advance!!!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: How to reindex ElasticSearch quickly?

vineeth mohan
In reply to this post by Dmitry Babitsky
Yes , you can reindex an existing index.
You just need to create the new index and give the command to re index it.
Secondly why it is high speed- It works from within elasticsearch , which negates all the network latency when the same thing is done from outside.
Also it uses scan to "bulk read" and uses bulk insert to copy. So all high speed options are used here.

Thanks
          Vineeth

On Sun, Jun 9, 2013 at 2:33 PM, Dmitry Babitsky <[hidden email]> wrote:
Two questions about the reindex plug-in:
1. Is it possible to reindex an existing index into a new one, so it would run offline?
2. I could not understand from the reindex-plug-in readme what is the right way to run it, so it will reindex the entire index, without any counters...

Thanks,
Dmitry.


On Sunday, June 9, 2013 11:51:30 AM UTC+3, Vineeth Mohan wrote:
https://github.com/karussell/elasticsearch-reindex

Thanks
           Vineeth

On Sun, Jun 9, 2013 at 1:43 PM, doug livesey <[hidden email]> wrote:
It could be worth looking at the bulk operations -- we rebuild an admittedly much smaller index by using the bulk API & loading 2000 documents in each operation.


On 9 June 2013 09:03, Dmitry Babitsky <[hidden email]> wrote:
I have an ElasticSearch index with around 200M documents, total index size of 90Gb.

I changed mapping, so I would like ElasticSearch to re-index all the documents.

I wrote a script that creates a new index (with the new mapping), then goes over all the documents in the old index and puts then into the new one.

It seems to work, but the problem is that it works extremely slowly.
It started with 300 documents / minute two days ago, and now the speed is 150 documents/minute.

The script runs on a machine within the same network the elastic search machines in.

With such speed it will require a month for the re-index to finish.


Does anybody know about some faster technique to re-index an elastic search index?

Thanks in advance!!!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: How to reindex ElasticSearch quickly?

Dmitry Babitsky
In reply to this post by doug livesey
Don't you have long time delays when you put high offsets?
For example, offset of 800,000 on my DB gives delay of about 30 seconds from the time I sent the search command till I start receiving the documents.

On Sunday, June 9, 2013 1:22:58 PM UTC+3, doug livesey wrote:
IIRC, you can query for a bunch of documents, and they'll be returned (nested in the response) in an array. There must be limit and offset options to those queries.
Once you have have your array of documents, you can feed that to the bulk index API.


On 9 June 2013 10:40, Dmitry Babitsky <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="HooJThGLp_wJ">dim...@...> wrote:
The idea of bulk indexing sounds very good!
One question - how do you perform the bulk read?

Thanks a lot!!!


On Sunday, June 9, 2013 11:13:27 AM UTC+3, doug livesey wrote:
It could be worth looking at the bulk operations -- we rebuild an admittedly much smaller index by using the bulk API & loading 2000 documents in each operation.


On 9 June 2013 09:03, Dmitry Babitsky <[hidden email]> wrote:
I have an ElasticSearch index with around 200M documents, total index size of 90Gb.

I changed mapping, so I would like ElasticSearch to re-index all the documents.

I wrote a script that creates a new index (with the new mapping), then goes over all the documents in the old index and puts then into the new one.

It seems to work, but the problem is that it works extremely slowly.
It started with 300 documents / minute two days ago, and now the speed is 150 documents/minute.

The script runs on a machine within the same network the elastic search machines in.

With such speed it will require a month for the re-index to finish.


Does anybody know about some faster technique to re-index an elastic search index?

Thanks in advance!!!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="HooJThGLp_wJ">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: How to reindex ElasticSearch quickly?

Dmitry Babitsky
In reply to this post by doug livesey
Hi Doug,

I tried your approach, but did not get any time improvement.
After some debugging I found out that the bulk=True flag in my index command has no effect.

The code that I used is:
 search_obj = pyes.query.Search(query = pyes.query.MatchAllQuery(), start=resume_from)
 
 old_index_iterator 
= self.esconn.search(search_obj, self.index_name)
 counter 
= 0
 BULK_SIZE 
= 2000
 
 
for doc in old_index_iterator:
   
self.esconn.index(doc=doc, doc_type=DOC_TYPE, index=new_index_name, id=doc.get_id(), bulk=True)
   counter 
+= 1
  
   
if counter % BULK_SIZE == 0:
     
self.logger.debug("Refreshing...")
     self.esconn.refresh()
     self.logger.debug("Refresh done.")
 
                 
 self.esconn.refresh()


Could you please let me know if you use any other pyes API for bulk inserts?

Thanks!!!

On Sunday, June 9, 2013 11:13:27 AM UTC+3, doug livesey wrote:
It could be worth looking at the bulk operations -- we rebuild an admittedly much smaller index by using the bulk API & loading 2000 documents in each operation.


On 9 June 2013 09:03, Dmitry Babitsky <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="MaSyNSrW0IkJ">dim...@...> wrote:
I have an ElasticSearch index with around 200M documents, total index size of 90Gb.

I changed mapping, so I would like ElasticSearch to re-index all the documents.

I wrote a script that creates a new index (with the new mapping), then goes over all the documents in the old index and puts then into the new one.

It seems to work, but the problem is that it works extremely slowly.
It started with 300 documents / minute two days ago, and now the speed is 150 documents/minute.

The script runs on a machine within the same network the elastic search machines in.

With such speed it will require a month for the re-index to finish.


Does anybody know about some faster technique to re-index an elastic search index?

Thanks in advance!!!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="MaSyNSrW0IkJ">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: How to reindex ElasticSearch quickly?

Boaz Leskes
Hi Dmitry,

You should the use the scan search type: http://www.elasticsearch.org/guide/reference/api/search/search-type/

In pyes I believe the option is scan=True. Here is a snippet I wrote a while ago. Perhaps with an older version than the one you use.

result_set = es_client.search(q,indices="index",scan=True,size=batch_size)
## PATCH pyes for a scanning bug

result_set
._max_item = None




result_set is now an interable where you can read documents from. pyes will make more calls to elasticsearch when needed.

Cheers,
Boaz

On Tuesday, June 11, 2013 8:43:46 AM UTC+2, Dmitry Babitsky wrote:
Hi Doug,

I tried your approach, but did not get any time improvement.
After some debugging I found out that the bulk=True flag in my index command has no effect.

The code that I used is:
 search_obj = pyes.query.Search(query = pyes.query.MatchAllQuery(), start=resume_from)
 
 old_index_iterator 
= self.esconn.search(search_obj, self.index_name)
 counter 
= 0
 BULK_SIZE 
= 2000
 
 
for doc in old_index_iterator:
   
self.esconn.index(doc=doc, doc_type=DOC_TYPE, index=new_index_name, id=doc.get_id(), bulk=True)
   counter 
+= 1
  
   
if counter % BULK_SIZE == 0:
     
self.logger.debug("Refreshing...")
     self.esconn.refresh()
     self.logger.debug("Refresh done.")
 
                 
 self.esconn.refresh()


Could you please let me know if you use any other pyes API for bulk inserts?

Thanks!!!

On Sunday, June 9, 2013 11:13:27 AM UTC+3, doug livesey wrote:
It could be worth looking at the bulk operations -- we rebuild an admittedly much smaller index by using the bulk API & loading 2000 documents in each operation.


On 9 June 2013 09:03, Dmitry Babitsky <[hidden email]> wrote:
I have an ElasticSearch index with around 200M documents, total index size of 90Gb.

I changed mapping, so I would like ElasticSearch to re-index all the documents.

I wrote a script that creates a new index (with the new mapping), then goes over all the documents in the old index and puts then into the new one.

It seems to work, but the problem is that it works extremely slowly.
It started with 300 documents / minute two days ago, and now the speed is 150 documents/minute.

The script runs on a machine within the same network the elastic search machines in.

With such speed it will require a month for the re-index to finish.


Does anybody know about some faster technique to re-index an elastic search index?

Thanks in advance!!!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: How to reindex ElasticSearch quickly?

Dmitry Babitsky
Hi Boaz,

Thanks a lot for your answer.
According to my measurements, however, the bottleneck is in index, which ignores bulk=True flag, not in search...


Dmitry.

On Tuesday, June 11, 2013 11:46:10 AM UTC+3, Boaz Leskes wrote:
Hi Dmitry,


In pyes I believe the option is scan=True. Here is a snippet I wrote a while ago. Perhaps with an older version than the one you use.

result_set = es_client.search(q,indices="index",scan=True,size=batch_size)
## PATCH pyes for a scanning bug

result_set
._max_item = None




result_set is now an interable where you can read documents from. pyes will make more calls to elasticsearch when needed.

Cheers,
Boaz

On Tuesday, June 11, 2013 8:43:46 AM UTC+2, Dmitry Babitsky wrote:
Hi Doug,

I tried your approach, but did not get any time improvement.
After some debugging I found out that the bulk=True flag in my index command has no effect.

The code that I used is:
 search_obj = pyes.query.Search(query = pyes.query.MatchAllQuery(), start=resume_from)
 
 old_index_iterator 
= self.esconn.search(search_obj, self.index_name)
 counter 
= 0
 BULK_SIZE 
= 2000
 
 
for doc in old_index_iterator:
   
self.esconn.index(doc=doc, doc_type=DOC_TYPE, index=new_index_name, id=doc.get_id(), bulk=True)
   counter 
+= 1
  
   
if counter % BULK_SIZE == 0:
     
self.logger.debug("Refreshing...")
     self.esconn.refresh()
     self.logger.debug("Refresh done.")
 
                 
 self.esconn.refresh()


Could you please let me know if you use any other pyes API for bulk inserts?

Thanks!!!

On Sunday, June 9, 2013 11:13:27 AM UTC+3, doug livesey wrote:
It could be worth looking at the bulk operations -- we rebuild an admittedly much smaller index by using the bulk API & loading 2000 documents in each operation.


On 9 June 2013 09:03, Dmitry Babitsky <[hidden email]> wrote:
I have an ElasticSearch index with around 200M documents, total index size of 90Gb.

I changed mapping, so I would like ElasticSearch to re-index all the documents.

I wrote a script that creates a new index (with the new mapping), then goes over all the documents in the old index and puts then into the new one.

It seems to work, but the problem is that it works extremely slowly.
It started with 300 documents / minute two days ago, and now the speed is 150 documents/minute.

The script runs on a machine within the same network the elastic search machines in.

With such speed it will require a month for the re-index to finish.


Does anybody know about some faster technique to re-index an elastic search index?

Thanks in advance!!!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: How to reindex ElasticSearch quickly?

Boaz Leskes
Hi Dmitry,

I'll have to dive into pyes code to see why it goes wrong, but for speed you really need to use the bulk api  for indexing and the scan search type together. If pyes is in your way, you can easily construct the request your self using json.dumps (watch out for unicode data and encoding). More info here: http://www.elasticsearch.org/guide/reference/api/bulk/

Cheers,
Boaz

 

On Tuesday, June 11, 2013 11:10:06 AM UTC+2, Dmitry Babitsky wrote:
Hi Boaz,

Thanks a lot for your answer.
According to my measurements, however, the bottleneck is in index, which ignores bulk=True flag, not in search...


Dmitry.

On Tuesday, June 11, 2013 11:46:10 AM UTC+3, Boaz Leskes wrote:
Hi Dmitry,


In pyes I believe the option is scan=True. Here is a snippet I wrote a while ago. Perhaps with an older version than the one you use.

result_set = es_client.search(q,indices="index",scan=True,size=batch_size)
## PATCH pyes for a scanning bug

result_set
._max_item = None




result_set is now an interable where you can read documents from. pyes will make more calls to elasticsearch when needed.

Cheers,
Boaz

On Tuesday, June 11, 2013 8:43:46 AM UTC+2, Dmitry Babitsky wrote:
Hi Doug,

I tried your approach, but did not get any time improvement.
After some debugging I found out that the bulk=True flag in my index command has no effect.

The code that I used is:
 search_obj = pyes.query.Search(query = pyes.query.MatchAllQuery(), start=resume_from)
 
 old_index_iterator 
= self.esconn.search(search_obj, self.index_name)
 counter 
= 0
 BULK_SIZE 
= 2000
 
 
for doc in old_index_iterator:
   
self.esconn.index(doc=doc, doc_type=DOC_TYPE, index=new_index_name, id=doc.get_id(), bulk=True)
   counter 
+= 1
  
   
if counter % BULK_SIZE == 0:
     
self.logger.debug("Refreshing...")
     self.esconn.refresh()
     self.logger.debug("Refresh done.")
 
                 
 self.esconn.refresh()


Could you please let me know if you use any other pyes API for bulk inserts?

Thanks!!!

On Sunday, June 9, 2013 11:13:27 AM UTC+3, doug livesey wrote:
It could be worth looking at the bulk operations -- we rebuild an admittedly much smaller index by using the bulk API & loading 2000 documents in each operation.


On 9 June 2013 09:03, Dmitry Babitsky <[hidden email]> wrote:
I have an ElasticSearch index with around 200M documents, total index size of 90Gb.

I changed mapping, so I would like ElasticSearch to re-index all the documents.

I wrote a script that creates a new index (with the new mapping), then goes over all the documents in the old index and puts then into the new one.

It seems to work, but the problem is that it works extremely slowly.
It started with 300 documents / minute two days ago, and now the speed is 150 documents/minute.

The script runs on a machine within the same network the elastic search machines in.

With such speed it will require a month for the re-index to finish.


Does anybody know about some faster technique to re-index an elastic search index?

Thanks in advance!!!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: How to reindex ElasticSearch quickly?

Dmitry Babitsky
In reply to this post by vineeth mohan
Hi Vineeth,

I've installed the _reindex plug-in, and it works very fast indeed.
I only have one small problem - I activate the plugin with curl -XPUT command from the server elastic search runs from (localhost), and each time I run it, it hangs up after several hours.
The error message that I see coming back from curl is: curl: (52) Empty reply from server

ubuntu@elasticsearch-test:~$ date; time curl -XPUT 'http://localhost:9200/my_index_2013_06_19_reindexed/my_type/_reindex?searchIndex=my_index&searchType=my_type&hitsPerPage=2000'; date
Wed Jun 19 14:22:20 UTC 2013
        curl
: (52) Empty reply from server


real    
257m28.136s
user    
0m0.216s
sys    
0m0.460s
Wed Jun 19 18:39:48 UTC 2013

The example above re-indexed 4M out of 6M of documents that I have.

Any ideas why it goes wrong here?

Thanks,
Dmitry.


On Sunday, June 9, 2013 11:51:30 AM UTC+3, Vineeth Mohan wrote:
https://github.com/karussell/elasticsearch-reindex

Thanks
           Vineeth

On Sun, Jun 9, 2013 at 1:43 PM, doug livesey <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="luHZnxx6AKUJ">bio...@...> wrote:
It could be worth looking at the bulk operations -- we rebuild an admittedly much smaller index by using the bulk API & loading 2000 documents in each operation.


On 9 June 2013 09:03, Dmitry Babitsky <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="luHZnxx6AKUJ">dim...@...> wrote:
I have an ElasticSearch index with around 200M documents, total index size of 90Gb.

I changed mapping, so I would like ElasticSearch to re-index all the documents.

I wrote a script that creates a new index (with the new mapping), then goes over all the documents in the old index and puts then into the new one.

It seems to work, but the problem is that it works extremely slowly.
It started with 300 documents / minute two days ago, and now the speed is 150 documents/minute.

The script runs on a machine within the same network the elastic search machines in.

With such speed it will require a month for the re-index to finish.


Does anybody know about some faster technique to re-index an elastic search index?

Thanks in advance!!!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="luHZnxx6AKUJ">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="luHZnxx6AKUJ">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: How to reindex ElasticSearch quickly?

Radu Gheorghe-2
One more thing that would probably help is disabling the refresh (set refresh_interval to -1) to your new index settings, and then change the refresh_interval back to whatever suits you (1 is the default - for auto-refreshing each second).

This also implies that you don't need to do any refresh from your pyes app. Btw, in pyes, bulks are automatically sent every bulk_size (which you have to specify when creating the connection object). If you need to flush the bulk, there's something like conn.flush() (don't remember the name exactly), which does it. You probably want to add that when your script is done, although in theory it should flush the bulk on exit.


On Thu, Jun 20, 2013 at 9:47 AM, Dmitry Babitsky <[hidden email]> wrote:
Hi Vineeth,

I've installed the _reindex plug-in, and it works very fast indeed.
I only have one small problem - I activate the plugin with curl -XPUT command from the server elastic search runs from (localhost), and each time I run it, it hangs up after several hours.
The error message that I see coming back from curl is: curl: (52) Empty reply from server

ubuntu@elasticsearch-test:~$ date; time curl -XPUT 'http://localhost:9200/my_index_2013_06_19_reindexed/my_type/_reindex?searchIndex=my_index&searchType=my_type&hitsPerPage=2000'; date
Wed Jun 19 14:22:20 UTC 2013
        curl
: (52) Empty reply from server


real    
257m28.136s
user    
0m0.216s
sys    
0m0.460s
Wed Jun 19 18:39:48 UTC 2013

The example above re-indexed 4M out of 6M of documents that I have.

Any ideas why it goes wrong here?

Thanks,
Dmitry.


On Sunday, June 9, 2013 11:51:30 AM UTC+3, Vineeth Mohan wrote:
https://github.com/karussell/elasticsearch-reindex

Thanks
           Vineeth

On Sun, Jun 9, 2013 at 1:43 PM, doug livesey <[hidden email]> wrote:
It could be worth looking at the bulk operations -- we rebuild an admittedly much smaller index by using the bulk API & loading 2000 documents in each operation.


On 9 June 2013 09:03, Dmitry Babitsky <[hidden email]> wrote:
I have an ElasticSearch index with around 200M documents, total index size of 90Gb.

I changed mapping, so I would like ElasticSearch to re-index all the documents.

I wrote a script that creates a new index (with the new mapping), then goes over all the documents in the old index and puts then into the new one.

It seems to work, but the problem is that it works extremely slowly.
It started with 300 documents / minute two days ago, and now the speed is 150 documents/minute.

The script runs on a machine within the same network the elastic search machines in.

With such speed it will require a month for the re-index to finish.


Does anybody know about some faster technique to re-index an elastic search index?

Thanks in advance!!!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.