Do unique/reusable _scroll_ids exist?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Do unique/reusable _scroll_ids exist?

Oli McCormack
Hi guys,

I'm attempting to implement pagination for our application. The catch is that our documents require a little post-query filtering, so sometimes if a user requests 500 documents, we scroll, get 500 from ES, filter and end up with a lower number. In this case, we perform the next scroll, get a number of results and build until we have 500 valid docs.

I had some related questions about scrolling / the scroll id returned by search scroll requests.

Question1: Is it possible to use the same scroll id multiple times to get the same set of results in the over-all result set?

Question2: (related to Question1) I'm confused by the scroll_id returned whilst doing a scan search then scrolling. What I see is that that when I start scrolling, for a period of time I get the same _scroll_id back. After some number of requests it changes. I would have expected to either (1) get the same _scroll_id over and over or (2) get a different _scroll_id each time. Are either of these correct? At the bottom of this mail I've given a short example set of req/resp.

Any pointers on this appreciated. I'd also be interested in hearing from anyone who has successfully implemented pagination and the approach you took.

Cheers,
oli

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Do unique/reusable _scroll_ids exist?

Oli McCormack
Ah, I never gave my example. In case it's of use:

Request 1
>> curl -XPOST 'localhost:9200/foo/bar/_search?search_type=scan&scroll=10m&size=10' \
        -d '{"query":{"constant_score":{"boost":1,"filter":{"term":{"x":false}}}}}'

{"_scroll_id":"abc","took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":108,"max_score":0.0,"hits":[]}}

Request 2
>> curl -XPOST 'localhost:9200/_search/scroll?scroll=10m'
        -d 'abc

{"_scroll_id":"abc","took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":108,"max_score":0.0,"hits":[{..}, {..}]}}

.. after some number of requests

{"_scroll_id":"def","took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":108,"max_score":0.0,"hits":[{..}, {..}]}}


On Wednesday, June 5, 2013 8:48:43 AM UTC-7, Oli wrote:
Hi guys,

I'm attempting to implement pagination for our application. The catch is that our documents require a little post-query filtering, so sometimes if a user requests 500 documents, we scroll, get 500 from ES, filter and end up with a lower number. In this case, we perform the next scroll, get a number of results and build until we have 500 valid docs.

I had some related questions about scrolling / the scroll id returned by search scroll requests.

Question1: Is it possible to use the same scroll id multiple times to get the same set of results in the over-all result set?

Question2: (related to Question1) I'm confused by the scroll_id returned whilst doing a scan search then scrolling. What I see is that that when I start scrolling, for a period of time I get the same _scroll_id back. After some number of requests it changes. I would have expected to either (1) get the same _scroll_id over and over or (2) get a different _scroll_id each time. Are either of these correct? At the bottom of this mail I've given a short example set of req/resp.

Any pointers on this appreciated. I'd also be interested in hearing from anyone who has successfully implemented pagination and the approach you took.

Cheers,
oli

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Do unique/reusable _scroll_ids exist?

sujoysett
Hi,

I had faced this issue quite long ago while implementing custom client side code for ES data backup and re-indexing purpose.
Yes, the scroll ID remains same for few hits, after which it changes.

The solution is to use the scroll ID returned with every hit response in the subsequent request, i.e. following an ID chaining mechanism will work.

Using the first scroll ID repeatedly fetches only a few results, not all.
I guess the scroll ID gets renewed after the timestamp expires (calculated from the point of first hit). But this statement is based on random observation, I am not sure of this, ES experts can elaborate the underlying cause better. I would be glad to know the actual cause too.


- Sujoy.

On Wednesday, June 5, 2013 9:20:34 PM UTC+5:30, Oli wrote:
Ah, I never gave my example. In case it's of use:

Request 1
>> curl -XPOST 'localhost:9200/foo/bar/_search?search_type=scan&scroll=10m&size=10' \
        -d '{"query":{"constant_score":{"boost":1,"filter":{"term":{"x":false}}}}}'

{"_scroll_id":"abc","took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":108,"max_score":0.0,"hits":[]}}

Request 2
>> curl -XPOST 'localhost:9200/_search/scroll?scroll=10m'
        -d 'abc

{"_scroll_id":"abc","took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":108,"max_score":0.0,"hits":[{..}, {..}]}}

.. after some number of requests

{"_scroll_id":"def","took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":108,"max_score":0.0,"hits":[{..}, {..}]}}


On Wednesday, June 5, 2013 8:48:43 AM UTC-7, Oli wrote:
Hi guys,

I'm attempting to implement pagination for our application. The catch is that our documents require a little post-query filtering, so sometimes if a user requests 500 documents, we scroll, get 500 from ES, filter and end up with a lower number. In this case, we perform the next scroll, get a number of results and build until we have 500 valid docs.

I had some related questions about scrolling / the scroll id returned by search scroll requests.

Question1: Is it possible to use the same scroll id multiple times to get the same set of results in the over-all result set?

Question2: (related to Question1) I'm confused by the scroll_id returned whilst doing a scan search then scrolling. What I see is that that when I start scrolling, for a period of time I get the same _scroll_id back. After some number of requests it changes. I would have expected to either (1) get the same _scroll_id over and over or (2) get a different _scroll_id each time. Are either of these correct? At the bottom of this mail I've given a short example set of req/resp.

Any pointers on this appreciated. I'd also be interested in hearing from anyone who has successfully implemented pagination and the approach you took.

Cheers,
oli

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Do unique/reusable _scroll_ids exist?

Oli McCormack
Thanks Sujoy, appreciate you getting back to this.

I've found that to be the solution. The unfortunate thing for me is that I'd like to able to re-fetch the results for a specific position in a scroll.

Given that a token appears to always yield the next set of results (at least for all of the results that token represents) it seems like I can't ever re-fetch something I've already obtained once. Any information to the contrary of that would be great to hear!

- oli 


On Mon, Jun 10, 2013 at 2:27 AM, Sujoy Sett <[hidden email]> wrote:
Hi,

I had faced this issue quite long ago while implementing custom client side code for ES data backup and re-indexing purpose.
Yes, the scroll ID remains same for few hits, after which it changes.

The solution is to use the scroll ID returned with every hit response in the subsequent request, i.e. following an ID chaining mechanism will work.

Using the first scroll ID repeatedly fetches only a few results, not all.
I guess the scroll ID gets renewed after the timestamp expires (calculated from the point of first hit). But this statement is based on random observation, I am not sure of this, ES experts can elaborate the underlying cause better. I would be glad to know the actual cause too.


- Sujoy.


On Wednesday, June 5, 2013 9:20:34 PM UTC+5:30, Oli wrote:
Ah, I never gave my example. In case it's of use:

Request 1
>> curl -XPOST 'localhost:9200/foo/bar/_search?search_type=scan&scroll=10m&size=10' \
        -d '{"query":{"constant_score":{"boost":1,"filter":{"term":{"x":false}}}}}'

{"_scroll_id":"abc","took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":108,"max_score":0.0,"hits":[]}}

Request 2
>> curl -XPOST 'localhost:9200/_search/scroll?scroll=10m'
        -d 'abc

{"_scroll_id":"abc","took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":108,"max_score":0.0,"hits":[{..}, {..}]}}

.. after some number of requests

{"_scroll_id":"def","took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":108,"max_score":0.0,"hits":[{..}, {..}]}}


On Wednesday, June 5, 2013 8:48:43 AM UTC-7, Oli wrote:
Hi guys,

I'm attempting to implement pagination for our application. The catch is that our documents require a little post-query filtering, so sometimes if a user requests 500 documents, we scroll, get 500 from ES, filter and end up with a lower number. In this case, we perform the next scroll, get a number of results and build until we have 500 valid docs.

I had some related questions about scrolling / the scroll id returned by search scroll requests.

Question1: Is it possible to use the same scroll id multiple times to get the same set of results in the over-all result set?

Question2: (related to Question1) I'm confused by the scroll_id returned whilst doing a scan search then scrolling. What I see is that that when I start scrolling, for a period of time I get the same _scroll_id back. After some number of requests it changes. I would have expected to either (1) get the same _scroll_id over and over or (2) get a different _scroll_id each time. Are either of these correct? At the bottom of this mail I've given a short example set of req/resp.

Any pointers on this appreciated. I'd also be interested in hearing from anyone who has successfully implemented pagination and the approach you took.

Cheers,
oli

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.