Elastic Search Pagination

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Elastic Search Pagination

Praveen Kumar B
This post has NOT been accepted by the mailing list yet.

I am trying to pull out 192 millions records from an index of ES. I am interested in only on "_source" part, which i want to use that in my another spark program. So my Idea is to get that 192 millions records out with only "_source" part.

In the below way, i tried to get the data in chunks.

curl -XGET 'http://localhost:80/asset1/_search?scroll=2m&filter_path=hits.hits._source' -d '{"from" : 1, "size" : 25000}' > hist_data_1-25000.txt;

curl -XGET 'http://localhost:80/asset1/_search?scroll=2m&filter_path=hits.hits._source' -d '{"from" : 25001, "size" : 25000}' > hist_data_25001-50000.txt;

curl -XGET 'http://localhost:80/asset1/_search?scroll=2m&filter_path=hits.hits._source' -d '{"from" : 50001, "size" : 25000}' > hist_data_50001-75000.txt;

curl -XGET 'http://localhost:80/asset1/_search?scroll=2m&filter_path=hits.hits._source' -d '{"from" : 75001, "size" : 25000}' > hist_data_75001-100000.txt;

I can't use the date column to filter as date value is not present in all the documents. So, I had to dependent on pagination by rows.... I have chosen the above option

I am really not sure, here am i doing any wrong? Can someone help me in this.

Praveen Kumar B