performance issue with script scoring with fields having a large array

classic Classic list List threaded Threaded
2 messages Options
NM
Reply | Threaded
Open this post in threaded view
|

performance issue with script scoring with fields having a large array

NM
I have document having fields containing  large array.

I would like to score according to the value of a nth element of such array, but got very slow answer (5s) for only 10K document indexed.

my mapping:
document {
id: value,  
field2: string,
field3: [ int_1,int_2, ... , int_10k] <- large array of 10K integers
}

assume I generated and indexed 10K documents with 1K random integer values in the field 'field3'

I then use the following search query

GET /test/document/_search
{
  "query":{
   "function_score":{
      "script_score" : {
    "script" : " _source.fields3[12] * _source.fields3[11] "
}

=> got 5000 ms

however with basic Java object with a simple nested loop:

- for all the documents
  score[i] =  doc[i].fields[12] * doc[i].fields[11] 
- sort by score

=> got < 50 ms

ES is 100 slower than a simple loop..

How to get similar performance with ES?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/db53da70-4f75-4088-b9a6-2cde3caef062%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: performance issue with script scoring with fields having a large array

Radu Gheorghe-2
Hello,

Using _source for scripts is typically slow, because ES has to go to each stored document and extract fields from there. A faster approach is to use something like doc['field3'].values[12], which will used the field data cache (already loaded in memory, at least after the first run):

More details about field data can be found here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.htm

Best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Wed, Apr 30, 2014 at 12:27 PM, NM <[hidden email]> wrote:
I have document having fields containing  large array.

I would like to score according to the value of a nth element of such array, but got very slow answer (5s) for only 10K document indexed.

my mapping:
document {
id: value,  
field2: string,
field3: [ int_1,int_2, ... , int_10k] <- large array of 10K integers
}

assume I generated and indexed 10K documents with 1K random integer values in the field 'field3'

I then use the following search query

GET /test/document/_search
{
  "query":{
   "function_score":{
      "script_score" : {
    "script" : " _source.fields3[12] * _source.fields3[11] "
}

=> got 5000 ms

however with basic Java object with a simple nested loop:

- for all the documents
  score[i] =  doc[i].fields[12] * doc[i].fields[11] 
- sort by score

=> got < 50 ms

ES is 100 slower than a simple loop..

How to get similar performance with ES?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/db53da70-4f75-4088-b9a6-2cde3caef062%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHXA0_2wmDJFBJvJ1fTUsszaP7GjVtJYfSU-AbHMq6NS%2BVqhFw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.