bulk index request dataloss

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

bulk index request dataloss

mzrth_7810
Hey everyone,

I've been trying to maximise my indexing rate. I'm indexing around a million documents, using 4 threads. Each thread is indexing at 2500 documents per bulk request, so that's 10000 at a time. 

I've been playing around with these parameters and found that if I go higher and higher, I eventually start getting data-loss. The index ends up with less than 1,000,000 documents every time. There are no error in the logs, so I'm not sure what's causing this.

Taking these parameters down a notch fixes this problem.

Has anyone seen this issue before?
Is there anything that can be done about it

Thankyou

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/181a75a6-7a12-421e-9757-a82876b24a15%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: bulk index request dataloss

joergprante@gmail.com
Do you evaluate the bulk request responses?

Jörg

On Tue, Apr 7, 2015 at 11:16 AM, mzrth_7810 <[hidden email]> wrote:
Hey everyone,

I've been trying to maximise my indexing rate. I'm indexing around a million documents, using 4 threads. Each thread is indexing at 2500 documents per bulk request, so that's 10000 at a time. 

I've been playing around with these parameters and found that if I go higher and higher, I eventually start getting data-loss. The index ends up with less than 1,000,000 documents every time. There are no error in the logs, so I'm not sure what's causing this.

Taking these parameters down a notch fixes this problem.

Has anyone seen this issue before?
Is there anything that can be done about it

Thankyou

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/181a75a6-7a12-421e-9757-a82876b24a15%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGk0XDn7KozQrWmHjW-zW89edQcDNhcxnn57JZDfqYuaw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: bulk index request dataloss

mzrth_7810
Turns out it was because the bulk thread pool queue size was too small, any new requests were being rejected.

Is it common to set threadpool.bulk.queue_size to something like 1000 ?

On Tuesday, 7 April 2015 11:10:33 UTC+1, Jörg Prante wrote:
Do you evaluate the bulk request responses?

Jörg

On Tue, Apr 7, 2015 at 11:16 AM, mzrth_7810 <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="9aJfCn0NKB0J" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">afraz...@...> wrote:
Hey everyone,

I've been trying to maximise my indexing rate. I'm indexing around a million documents, using 4 threads. Each thread is indexing at 2500 documents per bulk request, so that's 10000 at a time. 

I've been playing around with these parameters and found that if I go higher and higher, I eventually start getting data-loss. The index ends up with less than 1,000,000 documents every time. There are no error in the logs, so I'm not sure what's causing this.

Taking these parameters down a notch fixes this problem.

Has anyone seen this issue before?
Is there anything that can be done about it

Thankyou

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="9aJfCn0NKB0J" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/181a75a6-7a12-421e-9757-a82876b24a15%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/181a75a6-7a12-421e-9757-a82876b24a15%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/181a75a6-7a12-421e-9757-a82876b24a15%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/181a75a6-7a12-421e-9757-a82876b24a15%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/84469eb5-4fa3-480e-951c-712c2a31ff3b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: bulk index request dataloss

dadoonet
It would mean that you are going to accumulate up to 1000 requests of 2500 docs at a time in memory.
That could be a lot. You need to monitor that. That’s a lot of objects that might be GCed at some point.

If your bulk request is rejected, why not trying to slow down injection rate instead of filling the memory?

You could also think of setting replicas to 0 before bulk and the reactivate to 1 after injection.
Having SSD drives can also help but may be you have already that?


My 2 cents

-- 
David Pilato - Developer | Evangelist 





Le 23 avr. 2015 à 12:20, mzrth_7810 <[hidden email]> a écrit :

Turns out it was because the bulk thread pool queue size was too small, any new requests were being rejected.

Is it common to set threadpool.bulk.queue_size to something like 1000 ?

On Tuesday, 7 April 2015 11:10:33 UTC+1, Jörg Prante wrote:
Do you evaluate the bulk request responses?

Jörg

On Tue, Apr 7, 2015 at 11:16 AM, mzrth_7810 <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="9aJfCn0NKB0J" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;" class="">afraz...@...> wrote:
Hey everyone,

I've been trying to maximise my indexing rate. I'm indexing around a million documents, using 4 threads. Each thread is indexing at 2500 documents per bulk request, so that's 10000 at a time. 

I've been playing around with these parameters and found that if I go higher and higher, I eventually start getting data-loss. The index ends up with less than 1,000,000 documents every time. There are no error in the logs, so I'm not sure what's causing this.

Taking these parameters down a notch fixes this problem.

Has anyone seen this issue before?
Is there anything that can be done about it

Thankyou


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="9aJfCn0NKB0J" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;" class="">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/181a75a6-7a12-421e-9757-a82876b24a15%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/181a75a6-7a12-421e-9757-a82876b24a15%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/181a75a6-7a12-421e-9757-a82876b24a15%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" class="">https://groups.google.com/d/msgid/elasticsearch/181a75a6-7a12-421e-9757-a82876b24a15%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;" class="">https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/84469eb5-4fa3-480e-951c-712c2a31ff3b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/E4EC9E42-9DCA-410A-846F-1562B256D8DC%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: bulk index request dataloss

joergprante@gmail.com
In reply to this post by mzrth_7810
With the JDBC plugin, you should slightly increase the requests per bulk request ("maxbulkactions") in order to keep your concurrent bulk requests low enough to get handled by ES.

The ES bulk thread pool default setting is ok. Please avoid a change.

Jörg
 

On Thu, Apr 23, 2015 at 12:20 PM, mzrth_7810 <[hidden email]> wrote:
Turns out it was because the bulk thread pool queue size was too small, any new requests were being rejected.

Is it common to set threadpool.bulk.queue_size to something like 1000 ?

On Tuesday, 7 April 2015 11:10:33 UTC+1, Jörg Prante wrote:
Do you evaluate the bulk request responses?

Jörg

On Tue, Apr 7, 2015 at 11:16 AM, mzrth_7810 <[hidden email]> wrote:
Hey everyone,

I've been trying to maximise my indexing rate. I'm indexing around a million documents, using 4 threads. Each thread is indexing at 2500 documents per bulk request, so that's 10000 at a time. 

I've been playing around with these parameters and found that if I go higher and higher, I eventually start getting data-loss. The index ends up with less than 1,000,000 documents every time. There are no error in the logs, so I'm not sure what's causing this.

Taking these parameters down a notch fixes this problem.

Has anyone seen this issue before?
Is there anything that can be done about it

Thankyou

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/181a75a6-7a12-421e-9757-a82876b24a15%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/84469eb5-4fa3-480e-951c-712c2a31ff3b%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHx-fKLQLLqtZNM8mnupf_8n%3DMjviyx6NaqqAB7eHJFTA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: bulk index request dataloss

mzrth_7810
@David, that makes sense. We have SSDs and replicas are already set to 0 while bulk indexing.

@Jörg, we haven't changed the "threadpool.bulk.size" because according to the docs that's directly related to the number of processors available. However "threadpool.bulk.queue_size" has been modified. I'm slowly tuning it down to find a sweetspot, but the default 
seems a but too low. 

On Thursday, 23 April 2015 12:16:03 UTC+1, Jörg Prante wrote:
With the JDBC plugin, you should slightly increase the requests per bulk request ("maxbulkactions") in order to keep your concurrent bulk requests low enough to get handled by ES.

The ES bulk thread pool default setting is ok. Please avoid a change.

Jörg
 

On Thu, Apr 23, 2015 at 12:20 PM, mzrth_7810 <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="XZSWepCC4pkJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">afraz...@...> wrote:
Turns out it was because the bulk thread pool queue size was too small, any new requests were being rejected.

Is it common to set threadpool.bulk.queue_size to something like 1000 ?

On Tuesday, 7 April 2015 11:10:33 UTC+1, Jörg Prante wrote:
Do you evaluate the bulk request responses?

Jörg

On Tue, Apr 7, 2015 at 11:16 AM, mzrth_7810 <[hidden email]> wrote:
Hey everyone,

I've been trying to maximise my indexing rate. I'm indexing around a million documents, using 4 threads. Each thread is indexing at 2500 documents per bulk request, so that's 10000 at a time. 

I've been playing around with these parameters and found that if I go higher and higher, I eventually start getting data-loss. The index ends up with less than 1,000,000 documents every time. There are no error in the logs, so I'm not sure what's causing this.

Taking these parameters down a notch fixes this problem.

Has anyone seen this issue before?
Is there anything that can be done about it

Thankyou

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/181a75a6-7a12-421e-9757-a82876b24a15%40googlegroups.com?utm_medium=email&amp;utm_source=footer" rel="nofollow" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/181a75a6-7a12-421e-9757-a82876b24a15%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/181a75a6-7a12-421e-9757-a82876b24a15%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/181a75a6-7a12-421e-9757-a82876b24a15%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" rel="nofollow" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="XZSWepCC4pkJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/84469eb5-4fa3-480e-951c-712c2a31ff3b%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/84469eb5-4fa3-480e-951c-712c2a31ff3b%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/84469eb5-4fa3-480e-951c-712c2a31ff3b%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/84469eb5-4fa3-480e-951c-712c2a31ff3b%40googlegroups.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c531c868-19ce-4d2e-9744-d043463ed084%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.