elastic bulk index rate

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

elastic bulk index rate

ghasem1992
This post has NOT been accepted by the mailing list yet.
I have a problem with bulk indexing rate. As you can see in following graph, after a while the ram usage starts to increse and the indexing rate start to decrese rapidly.
Why?



Server specification:
CPU: Core i7
RAM: 12 g

Elastic Configuration:
Bulk size: 30,000
ES_Heap_Size:5g
Nodes:1
Number of shards: 50
Number of replica: 0
bootstrap.mlockall: true
indices.merge.scheduler.max_thread_count: 1
indices.translog.flush_threshold_size : 1G
transient.indices.store.throttle.type : none
index. compound_on_flush: false,
index. compound_format: false,
index. Warmer.enable: false,
index.refresh_interval: -1,

Data specification:
Number of records: 20,000,000


 
Reply | Threaded
Open this post in threaded view
|

Re: elastic bulk index rate

Tri H Nguyen
This post has NOT been accepted by the mailing list yet.
I would reduce the bulk size to 10,000 and change the number of shards to 2 since it is a single-node cluster and you are indexing 20M documents.

Each shard can hold around 2B documents so 20M is not much. The more shards you have, the more resource ES is going to consume so 2 shards is more than enough. 50 is way too much.

If I were you, I would do the following without knowing much about the data and the number of cores
- index the same set with 1 shards, bulk 10,000, and take a measurement
- index the same set with 1 shards, bulk 20,000 and take a measurement
- index the same set with 1 shards, bulk 30,000 and take a measurement

then repeat again with 2 shards... you'll find what is acceptable with your HW and data set.

Good luck.