Custom Score Query and Sort questions

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Custom Score Query and Sort questions

John Chang
My application needs to have returned hits ordered either by a text field or a date field.  I've looked at the Custom Score Query doc (http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/custom_score_query/) and the Sort doc (http://www.elasticsearch.com/docs/elasticsearch/rest_api/search/sort/), tried them out, and searched the forum.  I'm afraid I'm still wondering:

1) The custom_score queries with script to seem to sort based on script criteria as well (correct me if I am wrong).  So, aside from performance differences, what are the functional differences between custom score query with a script and sorting?

2) Is it the case that custom_scopre query with script actually changes the score values, whereas a sort will not change score values but just return in a different order?  I suspected this from the docs, but I'm having trouble testing the idea because all my scores come back 0.0 fore each hit in my tests.

3) The Sort doc reads, "Note, it is recommended, for single custom based script based sorting, to use custom_score query instead as sorting based on score is faster."  So, when would one want (or need) to use sort over custom_score queries with a script to get ordered results?  (Perhaps the answers the above answer this.)

4) One of my searches needs to have results ordered alphabetic by a name field (if there), else an email field.  Is it correct to believe this would have to be handled by a custom_score with a script (as I need if-else logic) and a simple sort won't work?

5) The scripting module doc (http://www.elasticsearch.com/docs/elasticsearch/modules/scripting/) lists fields of type short, string, double, date, long, etc.  If I need results ordered by date, what is the best way to store that field from a performance perspective?  

6) Does sharding impact ordered search performance?

7) Are there any other important performance considerations for ordering results through Elastic Search I should be aware of (aside from the standard Lucene considerations)?  

As always, thanks so much your your time and for an awesome technology!
Reply | Threaded
Open this post in threaded view
|

Re: Custom Score Query and Sort questions

kimchy
Administrator
On Wed, Oct 6, 2010 at 7:34 PM, John Chang <[hidden email]> wrote:

My application needs to have returned hits ordered either by a text field or
a date field.  I've looked at the Custom Score Query doc
(http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/custom_score_query/)
and the Sort doc
(http://www.elasticsearch.com/docs/elasticsearch/rest_api/search/sort/),
tried them out, and searched the forum.  I'm afraid I'm still wondering:

1) The custom_score queries with script to seem to sort based on script
criteria as well (correct me if I am wrong).  So, aside from performance
differences, what are the functional differences between custom score query
with a script and sorting?

The custom score query allows to provide a custom calculation of the score of each document. With sorting, it will be sorted based on the value of the field, without any custom calculation.
 

2) Is it the case that custom_scopre query with script actually changes the
score values, whereas a sort will not change score values but just return in
a different order?  I suspected this from the docs, but I'm having trouble
testing the idea because all my scores come back 0.0 fore each hit in my
tests.

Yea, it changes the score value. If the query would have has a score of 0.25 for a certain document, and your script is (for simplicity sake) "_score * 2", then the score of that document will be 0.5.
 

3) The Sort doc reads, "Note, it is recommended, for single custom based
script based sorting, to use custom_score query instead as sorting based on
score is faster."  So, when would one want (or need) to use sort over
custom_score queries with a script to get ordered results?  (Perhaps the
answers the above answer this.)

You can also provide a script that will produce the sort values (compared with just saying "sort by this field"). If you do so though, and its the only sorting you do, then its usually better to have the same script used, just with a custom score query. Note that this only applied to numeric sorting with float precision.
 

4) One of my searches needs to have results ordered alphabetic by a name
field (if there), else an email field.  Is it correct to believe this would
have to be handled by a custom_score with a script (as I need if-else logic)
and a simple sort won't work?

The sort element can have 2 fields to sort by, first the name, and then the date. If that does not work (i.e. if its not similar names, they just don't exists), then a script can be used with the mentioned "if / else". That script *should* be a custom sort script and not a custom_score query, since it produces a string, and not a number (which then you could have tried and used custom score).

Note that mvel (the scripting language) gets a bit annoying when trying to implement complex logic (though its very very fast for forumlas). I am working on allowing to provide scripts in other langs.
 

5) The scripting module doc
(http://www.elasticsearch.com/docs/elasticsearch/modules/scripting/) lists
fields of type short, string, double, date, long, etc.  If I need results
ordered by date, what is the best way to store that field from a performance
perspective?

The simplest would be to add a sort by field on the date field. If you need to access it in a script, then the best way would be to access it as it is stored in the index, which is milliseconds since the epoch in long (this is what you would get when you do: doc['my_date_field'].value.

 

6) Does sharding impact ordered search performance?

Basically, each query is a "map / reduce" operation. The query gets executed on the relevant shards, and then gets reduced back to a single response (simplified). So, the more machines you have, and shards gets allocated to them, the faster the search will be. Note that replicas also play a role here (for example, increase the index.number_of_replicas from 1 to 2) since they are searchable as well.
 

7) Are there any other important performance considerations for ordering
results through Elastic Search I should be aware of (aside from the standard
Lucene considerations)?

Not sure what you include in the standard Lucene configuration, but elasticsearch has a mechanism which is similar in nature to Lucene FieldCache, so, when you sort on a field (or access it using doc[...] in a script), its terms will be loaded to memory.
 

As always, thanks so much your your time and for an awesome technology!

No problem, here to help!
 
--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Custom-Score-Query-and-Sort-questions-tp1644004p1644004.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.