Bulk indexing slowing down

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Bulk indexing slowing down

This post has NOT been accepted by the mailing list yet.

I am blocked on indexing documents into ElasticSearch. I am trying to index (only) ~3.7M docs into an ES index, & after indexing ~3.4M docs (take around 3GB disk space), the indexing rate has come down to approximately 10 docs/min, which is very worrying.

The index (mistakenly) has a single shard only, which I think could be causing a bottleneck somewhere.

Node config:
m3.large (7.5 GB RAM, 32 GB SSD storage)
ES_Heap_size: 1 GB (this is what I see on KOPF, which shows Heap usage: ~400MB out of ~1008MB)
50GB EBS volume attached with each node

We are using TransportClient for interacting with ES. The BulkProcessor has been configured for a bulk size of 5 MB, flush interval of 2 min(to avoid sending data<5 MB, we are bulk indexing) & 6 concurrent requests. There can be ~10 bulk requests in parallel to ES.

After seeing the indexing rate slowed down, I changed the cluster setting to threadpool.bulk.size of 2 & threadpool.bulk.queue_size: 80. I also turned off index refreshes for my index. number_of_replicas has also been set to 0 now (earlier it was 1).

The KOPF dashboard shows around <14% CPU usage.

Request you to please help at the earliest. Thanks!