Reindexing Strategy

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Reindexing Strategy

stratawing
I have a need to update my mappings and will therefore need to reindex - but will be doing so directly from an existing ES index (I'm using ES as my data store). I've read other posts, and think the following strategy will work, but have a couple questions: 

Here's the strategy:

1 - create new index with new mappings and different index name
2 - extract data from old index (e.g., using scroll)
3 - bulk load from old index into new index
4 - confirm consistency between indices (how do I do this?)
5 - add 'alias' to new index to map old index name to new index
6 - delete old index.

I have two questions regarding the strategy above:

1 - I'm not clear on how to confirm that the new index is fully consistent with the old index. Any suggestions?  Understood that I may need to halt writes to the old index while the comparison is being performed.

2 - There will be a moment (however brief) where both the new index (with the alias) and old index are getting requests. Once the alias is in place, will search results against both indices create duplicates in the result set (i.e., the same doc will show up twice)?  If so - it might create problems in my application.  Probably a minimal concern - but I'd like to hear whether anyone else has had issues "swapping" indices using the alias functionality in this manner.

Many thanks in advance for your input.




--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Reindexing Strategy

Ivan Brusic
For question 1, I am unsure about what your definition of
"consistency" might be. The availability of documents? The number of
documents/fields? The relevancy of certain searches? Changing your
mapping will affect your index and by definition, it will not be the
same as the original one. Consistency in the data world usual means
the consistent propagation of changes. Updating a mapping would not
affect that.

Question 2:
I use aliases for searching/incremental-updates, real index names for
full indexing. Once a new index is created, you can create an atomic
transaction that will remove the alias from the old index and create
an alias on the new one. Searches only know about the alias, never the
real index name.

Cheers,

Ivan

On Wed, Aug 29, 2012 at 10:54 AM, stratawing <[hidden email]> wrote:

> I have a need to update my mappings and will therefore need to reindex - but
> will be doing so directly from an existing ES index (I'm using ES as my data
> store). I've read other posts, and think the following strategy will work,
> but have a couple questions:
>
> Here's the strategy:
>
> 1 - create new index with new mappings and different index name
> 2 - extract data from old index (e.g., using scroll)
> 3 - bulk load from old index into new index
> 4 - confirm consistency between indices (how do I do this?)
> 5 - add 'alias' to new index to map old index name to new index
> 6 - delete old index.
>
> I have two questions regarding the strategy above:
>
> 1 - I'm not clear on how to confirm that the new index is fully consistent
> with the old index. Any suggestions?  Understood that I may need to halt
> writes to the old index while the comparison is being performed.
>
> 2 - There will be a moment (however brief) where both the new index (with
> the alias) and old index are getting requests. Once the alias is in place,
> will search results against both indices create duplicates in the result set
> (i.e., the same doc will show up twice)?  If so - it might create problems
> in my application.  Probably a minimal concern - but I'd like to hear
> whether anyone else has had issues "swapping" indices using the alias
> functionality in this manner.
>
> Many thanks in advance for your input.
>
>
>
>
> --
>
>

--


Reply | Threaded
Open this post in threaded view
|

Re: Reindexing Strategy

stratawing
Thanks Ivan!

Very helpful answer on question 2.  Regarding question 1, I really just want to confirm that all documents from the old index were successfully transferred to the new index.  In light of your comments, I believe the best approach is to just do a number of documents/fields check, and to check the response from the bulk action to see if there were write errors during the transfer.  Unless I'm missing something, this should get me enough information to confirm that everything was transferred.  Let me know if I'm mistaken.

Thanks again!


On Wednesday, August 29, 2012 6:30:25 PM UTC-4, Ivan Brusic wrote:
For question 1, I am unsure about what your definition of
"consistency" might be. The availability of documents? The number of
documents/fields? The relevancy of certain searches? Changing your
mapping will affect your index and by definition, it will not be the
same as the original one. Consistency in the data world usual means
the consistent propagation of changes. Updating a mapping would not
affect that.

Question 2:
I use aliases for searching/incremental-updates, real index names for
full indexing. Once a new index is created, you can create an atomic
transaction that will remove the alias from the old index and create
an alias on the new one. Searches only know about the alias, never the
real index name.

Cheers,

Ivan

On Wed, Aug 29, 2012 at 10:54 AM, stratawing <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="QTlthGauCbgJ">strat...@...> wrote:

> I have a need to update my mappings and will therefore need to reindex - but
> will be doing so directly from an existing ES index (I'm using ES as my data
> store). I've read other posts, and think the following strategy will work,
> but have a couple questions:
>
> Here's the strategy:
>
> 1 - create new index with new mappings and different index name
> 2 - extract data from old index (e.g., using scroll)
> 3 - bulk load from old index into new index
> 4 - confirm consistency between indices (how do I do this?)
> 5 - add 'alias' to new index to map old index name to new index
> 6 - delete old index.
>
> I have two questions regarding the strategy above:
>
> 1 - I'm not clear on how to confirm that the new index is fully consistent
> with the old index. Any suggestions?  Understood that I may need to halt
> writes to the old index while the comparison is being performed.
>
> 2 - There will be a moment (however brief) where both the new index (with
> the alias) and old index are getting requests. Once the alias is in place,
> will search results against both indices create duplicates in the result set
> (i.e., the same doc will show up twice)?  If so - it might create problems
> in my application.  Probably a minimal concern - but I'd like to hear
> whether anyone else has had issues "swapping" indices using the alias
> functionality in this manner.
>
> Many thanks in advance for your input.
>
>
>
>
> --
>
>

--