Quantcast

Elastic Search Pagination

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Elastic Search Pagination

Praveen Kumar B
This post has NOT been accepted by the mailing list yet.
Hi,

I am trying to pull out 192 millions records from an index of ES. I am interested in only on "_source" part, which i want to use that in my another spark program. So my Idea is to get that 192 millions records out with only "_source" part.

In the below way, i tried to get the data in chunks.

curl -XGET 'http://localhost:80/asset1/_search?scroll=2m&filter_path=hits.hits._source' -d '{"from" : 1, "size" : 25000}' > hist_data_1-25000.txt;

curl -XGET 'http://localhost:80/asset1/_search?scroll=2m&filter_path=hits.hits._source' -d '{"from" : 25001, "size" : 25000}' > hist_data_25001-50000.txt;

curl -XGET 'http://localhost:80/asset1/_search?scroll=2m&filter_path=hits.hits._source' -d '{"from" : 50001, "size" : 25000}' > hist_data_50001-75000.txt;

curl -XGET 'http://localhost:80/asset1/_search?scroll=2m&filter_path=hits.hits._source' -d '{"from" : 75001, "size" : 25000}' > hist_data_75001-100000.txt;

I can't use the date column to filter as date value is not present in all the documents. So, I had to dependent on pagination by rows.... I have chosen the above option

I am really not sure, here am i doing any wrong? Can someone help me in this.

Thanks,
Praveen Kumar B
Loading...