ES 1.3.4 scrolling never ends

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

ES 1.3.4 scrolling never ends

Yarden Bar
Hi all,

I'm encountering a strange behavior when executing a search-scroll on a single node of ES-1.3.4 with Java client.

The scenario is as follows:
  1. Start a single node of version 1.3.4
  2. Add snapshot repository pointing to version 1.1.1 snapshots
  3. Restore snapshots version 1.1.1 snapshot to 1.3.4 node
  4. Execute search on an index with
  5. client.prepareSearch("my_index*").setQuery(QueryBuilders.filteredQuery(QueryBuilders.matchAllQuery(), FilterBuilders.queryFilter(
             
    QueryBuilders.queryString(s"$terms AND snapshotNo:[${mdp.fromSnapshot} TO ${mdp.toSnapshot}]") ))   )
     
    .addFields(OBFields.values.map(_.toString).toList: _*).setSize(pageSize).addSort(OBFields.updateNo.toString, SortOrder.ASC)
           
    .setScroll(TimeValue.timeValueMinutes(3)).execute().actionGet()


  6. Execute the following search scroll 
    client.prepareSearchScroll(scrollId).setScroll(TimeValue.timeValueMinutes(3)).execute().actionGet()
I have a loop iterating over #6, providing the same scrollId and checking for (result.getHits().getHits().legth == 0) to terminate.
I keep getting the same result 'page' with the same amount of results.


Any Idea??


Thanks,
Yarden

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/be0f385c-9d46-492b-a818-9bb04c92b214%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: ES 1.3.4 scrolling never ends

InquiringMind
You need to get the scroll ID from each response and use that one in the subsequent scan search. You cannot simply reuse the same scroll ID.

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d1f23ca4-13e6-4d1e-ad01-2cbda2810c94%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: ES 1.3.4 scrolling never ends

Yarden Bar
In reply to this post by Yarden Bar
I'll try that and report....

Thanks,
Yarden

On Wednesday, November 5, 2014 2:48:46 PM UTC+2, Yarden Bar wrote:
Hi all,

I'm encountering a strange behavior when executing a search-scroll on a single node of ES-1.3.4 with Java client.

The scenario is as follows:
  1. Start a single node of version 1.3.4
  2. Add snapshot repository pointing to version 1.1.1 snapshots
  3. Restore snapshots version 1.1.1 snapshot to 1.3.4 node
  4. Execute search on an index with
  5. client.prepareSearch("my_index*").setQuery(QueryBuilders.filteredQuery(QueryBuilders.matchAllQuery(), FilterBuilders.queryFilter(
             
    QueryBuilders.queryString(s"$terms AND snapshotNo:[${mdp.fromSnapshot} TO ${mdp.toSnapshot}]") ))   )
     
    .addFields(OBFields.values.map(_.toString).toList: _*).setSize(pageSize).addSort(OBFields.updateNo.toString, SortOrder.ASC)
           
    .setScroll(TimeValue.timeValueMinutes(3)).execute().actionGet()


  6. Execute the following search scroll 
    client.prepareSearchScroll(scrollId).setScroll(TimeValue.timeValueMinutes(3)).execute().actionGet()
I have a loop iterating over #6, providing the same scrollId and checking for (result.getHits().getHits().legth == 0) to terminate.
I keep getting the same result 'page' with the same amount of results.


Any Idea??


Thanks,
Yarden

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c658566c-f2e4-4020-bd14-08e413c81a9c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: ES 1.3.4 scrolling never ends

Yarden Bar

Update:
Only when I set the SearchType to something else than the QUERY_AND_FETCH the scroll success to finish.

Any idea why QUERY_THEN_FETCH(the default) brings me to an endless loop?

The full code is:
val client = ESClientFactory.createByNode(ESNode.Builder,cluster = "test_acm_es")

val query = QueryBuilders.filteredQuery(QueryBuilders.matchAllQuery(), FilterBuilders.queryFilter(
QueryBuilders.queryString("((market:2 AND feed:55) OR (market:2 AND feed:32)) AND snapshotNo:[12614 TO 12627]")))

var result: SearchResponse = client.prepareSearch("orderbook-2014.10.21")
.setQuery(query)
.addFields(OBFields.values.map(_.toString).toList: _*)
.setSearchType(SearchType.QUERY_THEN_FETCH)
.setSize(1000)
.addSort(OBFields.updateNo.toString, SortOrder.ASC)
.setScroll(new Scroll(TimeValue.timeValueMinutes(5)))
.execute().actionGet()


println(s"Result total hits:${result.getHits.totalHits()}")
println(s"Result hits:${result.getHits.getHits().length}")
do {
// result = new SearchScrollRequestBuilder(client,result.getScrollId).setScroll(TimeValue.timeValueMinutes(2)).execute().actionGet()
result = client.prepareSearchScroll(result.getScrollId).setScroll(TimeValue.timeValueMinutes(2)).execute().actionGet()
println(s"Iteration=$itr, scrollResult=${result.getHits.getHits.length}")
itr += 1

} while (result.getHits.getHits.length != 0)


Thanks for any idea...
Yarden

On Wednesday, November 5, 2014 5:52:25 PM UTC+2, Yarden Bar wrote:
I'll try that and report....

Thanks,
Yarden

On Wednesday, November 5, 2014 2:48:46 PM UTC+2, Yarden Bar wrote:
Hi all,

I'm encountering a strange behavior when executing a search-scroll on a single node of ES-1.3.4 with Java client.

The scenario is as follows:
  1. Start a single node of version 1.3.4
  2. Add snapshot repository pointing to version 1.1.1 snapshots
  3. Restore snapshots version 1.1.1 snapshot to 1.3.4 node
  4. Execute search on an index with
  5. client.prepareSearch("my_index*").setQuery(QueryBuilders.filteredQuery(QueryBuilders.matchAllQuery(), FilterBuilders.queryFilter(
             
    QueryBuilders.queryString(s"$terms AND snapshotNo:[${mdp.fromSnapshot} TO ${mdp.toSnapshot}]") ))   )
     
    .addFields(OBFields.values.map(_.toString).toList: _*).setSize(pageSize).addSort(OBFields.updateNo.toString, SortOrder.ASC)
           
    .setScroll(TimeValue.timeValueMinutes(3)).execute().actionGet()


  6. Execute the following search scroll 
    client.prepareSearchScroll(scrollId).setScroll(TimeValue.timeValueMinutes(3)).execute().actionGet()
I have a loop iterating over #6, providing the same scrollId and checking for (result.getHits().getHits().legth == 0) to terminate.
I keep getting the same result 'page' with the same amount of results.


Any Idea??


Thanks,
Yarden

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6f1a1178-d4e5-41e9-a464-68c6e9204779%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: ES 1.3.4 scrolling never ends

Yarden Bar
In reply to this post by Yarden Bar
One issue I identified is the heap size was too small for the query, I've increased the heap memory and the CircuitBreakerException stopped happening.

But the scrolling still returning the SAME result.

An updated code example is below:
import org.elasticsearch.action.search.SearchType
import org.elasticsearch.client.transport.TransportClient
import org.elasticsearch.common.settings.ImmutableSettings
import org.elasticsearch.common.transport.InetSocketTransportAddress
import org.elasticsearch.common.unit.TimeValue
import org.elasticsearch.index.query.{FilterBuilders, QueryBuilders}
import org.elasticsearch.search.Scroll
import org.elasticsearch.search.sort.SortOrder

val es_settings
= ImmutableSettings.settingsBuilder().put("transport.sniff", true).put("cluster.name", "test_acm_es").build()
var client = new TransportClient(es_settings).addTransportAddress(new InetSocketTransportAddress("myServer",9300))
val query
= QueryBuilders.filteredQuery(QueryBuilders.matchAllQuery(), FilterBuilders.queryFilter(
 
QueryBuilders.queryString("((market:2 AND feed:55) OR (market:2 AND feed:32))")))
var result = client.prepareSearch("orderbook-2014.11.03").setTypes(List("level"):_*).setQuery(query).setSearchType(SearchType.DFS_QUERY_THEN_FETCH).setSize(10000).addSort("updateNo", SortOrder.ASC).setScroll(new Scroll(TimeValue.timeValueMinutes(5))).get()
var scrollId = ""
var itr = 0
do {
 scrollId
= result.getScrollId
 result
= client.prepareSearchScroll(scrollId).setScroll(TimeValue.timeValueMinutes(3)).get()
 println
(s"Iteration=$itr, scrollResult=${result.getHits.getHits.length}")
// println("------------------------------------")
// result.getHits.getHits.foreach(h => println(h.getId))
// println("------------------------------------")
 itr
+=1
} while (result.getHits.getHits.length != 0)

enabling the print block reveals that the searchHit array is the same for each iteration...

Thanks,
Yarden

On Wednesday, November 5, 2014 2:48:46 PM UTC+2, Yarden Bar wrote:
Hi all,

I'm encountering a strange behavior when executing a search-scroll on a single node of ES-1.3.4 with Java client.

The scenario is as follows:
  1. Start a single node of version 1.3.4
  2. Add snapshot repository pointing to version 1.1.1 snapshots
  3. Restore snapshots version 1.1.1 snapshot to 1.3.4 node
  4. Execute search on an index with
  5. client.prepareSearch("my_index*").setQuery(QueryBuilders.filteredQuery(QueryBuilders.matchAllQuery(), FilterBuilders.queryFilter(
             
    QueryBuilders.queryString(s"$terms AND snapshotNo:[${mdp.fromSnapshot} TO ${mdp.toSnapshot}]") ))   )
     
    .addFields(OBFields.values.map(_.toString).toList: _*).setSize(pageSize).addSort(OBFields.updateNo.toString, SortOrder.ASC)
           
    .setScroll(TimeValue.timeValueMinutes(3)).execute().actionGet()


  6. Execute the following search scroll 
    client.prepareSearchScroll(scrollId).setScroll(TimeValue.timeValueMinutes(3)).execute().actionGet()
I have a loop iterating over #6, providing the same scrollId and checking for (result.getHits().getHits().legth == 0) to terminate.
I keep getting the same result 'page' with the same amount of results.


Any Idea??


Thanks,
Yarden

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/66e02775-17dd-4ea0-a8b3-39eb7e2a7aca%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: ES 1.3.4 scrolling never ends

joergprante@gmail.com
You must initiate scan/scroll with search type SCAN. The scan/scroll pattern is like this

SearchRequest  searchRequest = new SearchRequestBuilder(client).setQuery(QueryBuilders.matchAllQuery()).request();
searchRequest.searchType(SearchType.SCAN).scroll(request.getTimeout());
SearchResponse searchResponse = client.search(searchRequest).actionGet();
// get total hits here before entering the loop
while (searchResponse.getScrollId() != null) {
    searchResponse = client.prepareSearchScroll(searchResponse.getScrollId())
                            .setScroll(request.getTimeout()).execute().actionGet();
    long hits = searchResponse.getHits().getHits().length;
    // process hits of a scroll here

}

Jörg

On Mon, Nov 10, 2014 at 1:27 PM, Yarden Bar <[hidden email]> wrote:
One issue I identified is the heap size was too small for the query, I've increased the heap memory and the CircuitBreakerException stopped happening.

But the scrolling still returning the SAME result.

An updated code example is below:
import org.elasticsearch.action.search.SearchType
import org.elasticsearch.client.transport.TransportClient
import org.elasticsearch.common.settings.ImmutableSettings
import org.elasticsearch.common.transport.InetSocketTransportAddress
import org.elasticsearch.common.unit.TimeValue
import org.elasticsearch.index.query.{FilterBuilders, QueryBuilders}
import org.elasticsearch.search.Scroll
import org.elasticsearch.search.sort.SortOrder

val es_settings
= ImmutableSettings.settingsBuilder().put("transport.sniff", true).put("cluster.name", "test_acm_es").build()
var client = new TransportClient(es_settings).addTransportAddress(new InetSocketTransportAddress("myServer",9300))
val query
= QueryBuilders.filteredQuery(QueryBuilders.matchAllQuery(), FilterBuilders.queryFilter(

 
QueryBuilders.queryString("((market:2 AND feed:55) OR (market:2 AND feed:32))")))
var result = client.prepareSearch("orderbook-2014.11.03").setTypes(List("level"):_*).setQuery(query).setSearchType(SearchType.DFS_QUERY_THEN_FETCH).setSize(10000).addSort("updateNo", SortOrder.ASC).setScroll(new Scroll(TimeValue.timeValueMinutes(5))).get()
var scrollId = ""
var itr = 0
do {
 scrollId
= result.getScrollId
 result
= client.prepareSearchScroll(scrollId).setScroll(TimeValue.timeValueMinutes(3)).get()
 println
(s"Iteration=$itr, scrollResult=${result.getHits.getHits.length}")
// println("------------------------------------")
// result.getHits.getHits.foreach(h => println(h.getId))
// println("------------------------------------")
 itr
+=1
} while (result.getHits.getHits.length != 0)

enabling the print block reveals that the searchHit array is the same for each iteration...

Thanks,
Yarden

On Wednesday, November 5, 2014 2:48:46 PM UTC+2, Yarden Bar wrote:
Hi all,

I'm encountering a strange behavior when executing a search-scroll on a single node of ES-1.3.4 with Java client.

The scenario is as follows:
  1. Start a single node of version 1.3.4
  2. Add snapshot repository pointing to version 1.1.1 snapshots
  3. Restore snapshots version 1.1.1 snapshot to 1.3.4 node
  4. Execute search on an index with
  5. client.prepareSearch("my_index*").setQuery(QueryBuilders.filteredQuery(QueryBuilders.matchAllQuery(), FilterBuilders.queryFilter(
             
    QueryBuilders.queryString(s"$terms AND snapshotNo:[${mdp.fromSnapshot} TO ${mdp.toSnapshot}]") ))   )
     
    .addFields(OBFields.values.map(_.toString).toList: _*).setSize(pageSize).addSort(OBFields.updateNo.toString, SortOrder.ASC)
           
    .setScroll(TimeValue.timeValueMinutes(3)).execute().actionGet()


  6. Execute the following search scroll 
    client.prepareSearchScroll(scrollId).setScroll(TimeValue.timeValueMinutes(3)).execute().actionGet()
I have a loop iterating over #6, providing the same scrollId and checking for (result.getHits().getHits().legth == 0) to terminate.
I keep getting the same result 'page' with the same amount of results.


Any Idea??


Thanks,
Yarden

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/66e02775-17dd-4ea0-a8b3-39eb7e2a7aca%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGgEhDf210fHVx%2BNqj-qFc5xu32zTp9FkK3W1Dtpi%3DJgg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: ES 1.3.4 scrolling never ends

Yarden Bar
Hi Jorg,

I cant use scan type because I need the documents sorted ASC on a field, scan returns the documents in the order they indexed.

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1eb0d4dd-1659-48a2-929b-194ebd531465%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: ES 1.3.4 scrolling never ends

joergprante@gmail.com
Scan is not really the order the docs are indexed (it depends on how the index segments in the shards return the docs).

But anyway, you can not scroll over a sorted result set.

Jörg

On Mon, Nov 10, 2014 at 3:12 PM, Yarden Bar <[hidden email]> wrote:
Hi Jorg,

I cant use scan type because I need the documents sorted ASC on a field, scan returns the documents in the order they indexed.

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1eb0d4dd-1659-48a2-929b-194ebd531465%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEfbojPZCOesok%2B5jdkRu6Z5CExTDzAVnwTCZXveL7dHw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: ES 1.3.4 scrolling never ends

InquiringMind
A while back, I wrote my own post-query response sorting so that I could handle cases that Elasticsearch didn't. One case was sorting a scan query. I used a Java TreeSet class and could also limit it to the top 'N' (configurable) items. It is very, very quick, pretty much adding no overhead to the existing scan logic. And it supports an arbitrarily complex compound sort key, much like an SQL ORDERBY statement; it's very easy to construct.

Probably not useful for a normal user query, but it is very useful for an ad-hoc query in which I wish to scan across an indeterminately large result set but still sort the results. 

One of these days, it might make a good plug-in candidate. But I am not sure how to integrate it with the scan API, so for now it's just part of the Java client layer.

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/74e311f5-ae54-4da1-9369-567e7bf03272%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: ES 1.3.4 scrolling never ends

Yarden Bar
Finally the issue was solved.

I forgot to mention that I had a Logstash output connected and it's protocol was set to 'node', meaning that logstash was part of my cluster.
Once I set the protocol to 'transport',scrolling was perfect!!

Credit to my team-leader for guidance....


Thanks everyone for your help
Yarden

On Monday, November 10, 2014 7:06:52 PM UTC+2, Brian wrote:
A while back, I wrote my own post-query response sorting so that I could handle cases that Elasticsearch didn't. One case was sorting a scan query. I used a Java TreeSet class and could also limit it to the top 'N' (configurable) items. It is very, very quick, pretty much adding no overhead to the existing scan logic. And it supports an arbitrarily complex compound sort key, much like an SQL ORDERBY statement; it's very easy to construct.

Probably not useful for a normal user query, but it is very useful for an ad-hoc query in which I wish to scan across an indeterminately large result set but still sort the results. 

One of these days, it might make a good plug-in candidate. But I am not sure how to integrate it with the scan API, so for now it's just part of the Java client layer.

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/44b5c8db-6c09-4527-b440-09d01bde3588%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.