Quantcast

Running _optimize, best practices

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Running _optimize, best practices

Milan Gornik
Hello,

I am wondering about best practices in running _optimize on index. Our index has around 25% of deleted_docs in it right now. We noticed performance degradation as time passed. Since this is a lot of space to reclaim, we would like to run _optimize. We are hoping this will help with the performances too. Before running it, there are some concerns though:

- While it is running, can we get some insight in the progress of this operation?
- While _optimize is running, will operations be fully stopped, or just slower?
- Is there any particular type of operation we should avoid while running _optimize (e.g. document deletes)?
- Anything else we should keep in mind before running _optimize?

Thanks for your time!
Regards,
Milan Gornik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Running _optimize, best practices

Zachary Tong
  • You can keep an eye on merges through the Segments API, or use the SegmentSpy plugin to visualize those segments as they merge
  • Slowed.  Merges run asynchronously (as with most operations in ES).  However, merging a lot of segments can be very CPU and Disk intensive, and it is possible to saturate a node's resources which can cause problems.  You may want to enable store level throttling on the merges so you don't swamp your nodes.
  • Optimize simply tells your shards to merge until the number of segments == max_num_segments.  Indexing or deleting docs will make this process more hairy, since new segments are being added or docs are being marked as deleted.
  • Optimize by definition invalidates most of the values you have cached in memory, since caches are per-segment and you are merging all your segments together.
However, with all that said, I don't think Optimize is necessary.  The presence of deleted docs shouldn't degrade performance, they are simply marked as deleted in memory and ignored.  Deletes will be removed whenever the segments are merged, which is handled by ES/Lucene automatically. Optimize is usually recommended when you know an index is no longer going to received new documents or deletes (e.g. old log data).  Then it makes sense to optimize the index and put to the side.

If this is a "live" index, your Optimize call is going to be quickly undone by new docs and deletes.  If you are bothered by the number of deletes hanging around, you could try increasing the "index.reclaim_deletes_weight" to make deleted docs more "heavy" in the segment, forcing a merge sooner.

-Zach

On Thursday, February 14, 2013 7:40:47 AM UTC-5, Milan Gornik wrote:
Hello,

I am wondering about best practices in running _optimize on index. Our index has around 25% of deleted_docs in it right now. We noticed performance degradation as time passed. Since this is a lot of space to reclaim, we would like to run _optimize. We are hoping this will help with the performances too. Before running it, there are some concerns though:

- While it is running, can we get some insight in the progress of this operation?
- While _optimize is running, will operations be fully stopped, or just slower?
- Is there any particular type of operation we should avoid while running _optimize (e.g. document deletes)?
- Anything else we should keep in mind before running _optimize?

Thanks for your time!
Regards,
Milan Gornik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Running _optimize, best practices

Milan Gornik

Hi Zachary,

Thanks for your detailed reply! We checked our ES response times in the meantime and we saw increase in performance just by having those excessive documents deleted. I was wondering if deleted documents space will be reclaimed, and based on your reply, that seems as something we shouldn't worry about. Since our system is live and is having a lot of ES operations running at all times, it seems safer not to initiate manual _optimize.

Thanks again!
Best regards,
Milan  


On Thursday, February 14, 2013 2:46:58 PM UTC+1, Zachary Tong wrote:
  • You can keep an eye on merges through the Segments API, or use the SegmentSpy plugin to visualize those segments as they merge
  • Slowed.  Merges run asynchronously (as with most operations in ES).  However, merging a lot of segments can be very CPU and Disk intensive, and it is possible to saturate a node's resources which can cause problems.  You may want to enable store level throttling on the merges so you don't swamp your nodes.
  • Optimize simply tells your shards to merge until the number of segments == max_num_segments.  Indexing or deleting docs will make this process more hairy, since new segments are being added or docs are being marked as deleted.
  • Optimize by definition invalidates most of the values you have cached in memory, since caches are per-segment and you are merging all your segments together.
However, with all that said, I don't think Optimize is necessary.  The presence of deleted docs shouldn't degrade performance, they are simply marked as deleted in memory and ignored.  Deletes will be removed whenever the segments are merged, which is handled by ES/Lucene automatically. Optimize is usually recommended when you know an index is no longer going to received new documents or deletes (e.g. old log data).  Then it makes sense to optimize the index and put to the side.

If this is a "live" index, your Optimize call is going to be quickly undone by new docs and deletes.  If you are bothered by the number of deletes hanging around, you could try increasing the "index.reclaim_deletes_weight" to make deleted docs more "heavy" in the segment, forcing a merge sooner.

-Zach

On Thursday, February 14, 2013 7:40:47 AM UTC-5, Milan Gornik wrote:
Hello,

I am wondering about best practices in running _optimize on index. Our index has around 25% of deleted_docs in it right now. We noticed performance degradation as time passed. Since this is a lot of space to reclaim, we would like to run _optimize. We are hoping this will help with the performances too. Before running it, there are some concerns though:

- While it is running, can we get some insight in the progress of this operation?
- While _optimize is running, will operations be fully stopped, or just slower?
- Is there any particular type of operation we should avoid while running _optimize (e.g. document deletes)?
- Anything else we should keep in mind before running _optimize?

Thanks for your time!
Regards,
Milan Gornik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Running _optimize, best practices

simonw-2
Hey Milan,

just to give you some background regarding optimize I recommend reading this: http://www.searchworkings.org/blog/-/blogs/simon-says%3A-optimize-is-bad-for-you
if you have question feel free to come back here to the list!

simon

On Friday, February 15, 2013 4:44:35 PM UTC+1, Milan Gornik wrote:

Hi Zachary,

Thanks for your detailed reply! We checked our ES response times in the meantime and we saw increase in performance just by having those excessive documents deleted. I was wondering if deleted documents space will be reclaimed, and based on your reply, that seems as something we shouldn't worry about. Since our system is live and is having a lot of ES operations running at all times, it seems safer not to initiate manual _optimize.

Thanks again!
Best regards,
Milan  


On Thursday, February 14, 2013 2:46:58 PM UTC+1, Zachary Tong wrote:
  • You can keep an eye on merges through the Segments API, or use the SegmentSpy plugin to visualize those segments as they merge
  • Slowed.  Merges run asynchronously (as with most operations in ES).  However, merging a lot of segments can be very CPU and Disk intensive, and it is possible to saturate a node's resources which can cause problems.  You may want to enable store level throttling on the merges so you don't swamp your nodes.
  • Optimize simply tells your shards to merge until the number of segments == max_num_segments.  Indexing or deleting docs will make this process more hairy, since new segments are being added or docs are being marked as deleted.
  • Optimize by definition invalidates most of the values you have cached in memory, since caches are per-segment and you are merging all your segments together.
However, with all that said, I don't think Optimize is necessary.  The presence of deleted docs shouldn't degrade performance, they are simply marked as deleted in memory and ignored.  Deletes will be removed whenever the segments are merged, which is handled by ES/Lucene automatically. Optimize is usually recommended when you know an index is no longer going to received new documents or deletes (e.g. old log data).  Then it makes sense to optimize the index and put to the side.

If this is a "live" index, your Optimize call is going to be quickly undone by new docs and deletes.  If you are bothered by the number of deletes hanging around, you could try increasing the "index.reclaim_deletes_weight" to make deleted docs more "heavy" in the segment, forcing a merge sooner.

-Zach

On Thursday, February 14, 2013 7:40:47 AM UTC-5, Milan Gornik wrote:
Hello,

I am wondering about best practices in running _optimize on index. Our index has around 25% of deleted_docs in it right now. We noticed performance degradation as time passed. Since this is a lot of space to reclaim, we would like to run _optimize. We are hoping this will help with the performances too. Before running it, there are some concerns though:

- While it is running, can we get some insight in the progress of this operation?
- While _optimize is running, will operations be fully stopped, or just slower?
- Is there any particular type of operation we should avoid while running _optimize (e.g. document deletes)?
- Anything else we should keep in mind before running _optimize?

Thanks for your time!
Regards,
Milan Gornik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Loading...