Help me understand how ES calculate the score to match query

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Help me understand how ES calculate the score to match query

Xudong You
I have two documents as follows:

1.
{
"title":"xbox"
}

2.
{
"title":"xbox xbox xbox"
}

Then I search the documents with following query:
{
"query":{"match":{"title":"xbox"}}
}

ES returns result as follows:
{"took":133,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":0.30685282,
"hits":[
{"_index":"storetest1","_type":"type","_id":"1","_score":0.30685282,"_source":{"title":"xbox","keywords":["xbox"]}},

{"_index":"storetest1","_type":"type","_id":"2","_score":0.26574233,"_source":{"title":"xbox xbox xbox","keywords":["xbox"]}}]}}


My question is, why #1 got higher score than #2? I thought #2 is higher than #1, since more xbox appear in title of #1.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/03afe5b3-0255-4d0d-ba15-0e9c2afbb96e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Help me understand how ES calculate the score to match query

Nhật Quang Phan

You can enable explain for your query and see how elasticsearch calculates score:

{
"explain": true,
"query": {
"match": {
"title": "xbox"
}
}
}

On Tuesday, March 10, 2015 at 3:15:50 PM UTC+7, Xudong You wrote:
I have two documents as follows:

1.
{
"title":"xbox"
}

2.
{
"title":"xbox xbox xbox"
}

Then I search the documents with following query:
{
"query":{"match":{"title":"xbox"}}
}

ES returns result as follows:
{"took":133,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":0.30685282,
"hits":[
{"_index":"storetest1","_type":"type","_id":"1","_score":0.30685282,"_source":{"title":"xbox","keywords":["xbox"]}},

{"_index":"storetest1","_type":"type","_id":"2","_score":0.26574233,"_source":{"title":"xbox xbox xbox","keywords":["xbox"]}}]}}


My question is, why #1 got higher score than #2? I thought #2 is higher than #1, since more xbox appear in title of #1.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/686a7c57-763a-4824-9fc3-36b0ff6c134b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Help me understand how ES calculate the score to match query

Xudong You
Thanks!
I tried the explain and better understand how the score comes. But still has question on the IDF score, the IDF in the explain output of my query is:
{
  "value": 0.30685282,
  "description": "idf(docFreq=1, maxDocs=1)"
}

What does docFreq and maxDocs in above mean? Per the IDF definition, the score should be affected by the total number of documents in the index, but seems the value is always 0.30685282 no matter how many docs I inserted to the index.


On Tuesday, March 10, 2015 at 5:39:56 PM UTC+8, Nhật Quang Phan wrote:

You can enable explain for your query and see how elasticsearch calculates score:

{
"explain": true,
"query": {
"match": {
"title": "xbox"
}
}
}

On Tuesday, March 10, 2015 at 3:15:50 PM UTC+7, Xudong You wrote:
I have two documents as follows:

1.
{
"title":"xbox"
}

2.
{
"title":"xbox xbox xbox"
}

Then I search the documents with following query:
{
"query":{"match":{"title":"xbox"}}
}

ES returns result as follows:
{"took":133,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":0.30685282,
"hits":[
{"_index":"storetest1","_type":"type","_id":"1","_score":0.30685282,"_source":{"title":"xbox","keywords":["xbox"]}},

{"_index":"storetest1","_type":"type","_id":"2","_score":0.26574233,"_source":{"title":"xbox xbox xbox","keywords":["xbox"]}}]}}


My question is, why #1 got higher score than #2? I thought #2 is higher than #1, since more xbox appear in title of #1.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3ab150a6-eccc-4145-a7e9-af16f3ff6752%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Help me understand how ES calculate the score to match query

Doug Turnbull
A couple of things are going on here

First read "Why is Relevance Broken". You're IDF might not be changing due to sharding.

Second
docFreq reflects this terms actual document frequency (how many documents does the term occur in)
maxDocs reflects the total number of documents on this shard

Third
maxDocs (and docFreq) do not reflect deletions. 

Lastly,
I presume you can find the documents you think you're adding in the index?

Hope that helps
-Doug

On Tue, Mar 10, 2015 at 9:46 PM, Xudong You <[hidden email]> wrote:
Thanks!
I tried the explain and better understand how the score comes. But still has question on the IDF score, the IDF in the explain output of my query is:
{
  "value": 0.30685282,
  "description": "idf(docFreq=1, maxDocs=1)"
}

What does docFreq and maxDocs in above mean? Per the IDF definition, the score should be affected by the total number of documents in the index, but seems the value is always 0.30685282 no matter how many docs I inserted to the index.


On Tuesday, March 10, 2015 at 5:39:56 PM UTC+8, Nhật Quang Phan wrote:

You can enable explain for your query and see how elasticsearch calculates score:

{
"explain": true,
"query": {
"match": {
"title": "xbox"
}
}
}

On Tuesday, March 10, 2015 at 3:15:50 PM UTC+7, Xudong You wrote:
I have two documents as follows:

1.
{
"title":"xbox"
}

2.
{
"title":"xbox xbox xbox"
}

Then I search the documents with following query:
{
"query":{"match":{"title":"xbox"}}
}

ES returns result as follows:
{"took":133,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":0.30685282,
"hits":[
{"_index":"storetest1","_type":"type","_id":"1","_score":0.30685282,"_source":{"title":"xbox","keywords":["xbox"]}},

{"_index":"storetest1","_type":"type","_id":"2","_score":0.26574233,"_source":{"title":"xbox xbox xbox","keywords":["xbox"]}}]}}


My question is, why #1 got higher score than #2? I thought #2 is higher than #1, since more xbox appear in title of #1.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3ab150a6-eccc-4145-a7e9-af16f3ff6752%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Doug Turnbull
Search Relevance Lead

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALG6HL_HaMFh4xh3sscn8w70NbEtiCf%2Bntxwzm811kDsyaAL5A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Help me understand how ES calculate the score to match query

Xudong You
Thanks a lot!
I now better understand how IDF in ES works, as you said, it is caused by sharding. After I added enough documents, I do see changes on IDF value as well as docFreq and maxDocs in output.


On Wednesday, March 11, 2015 at 9:54:13 AM UTC+8, Doug Turnbull wrote:
A couple of things are going on here

First read "Why is Relevance Broken". You're IDF might not be changing due to sharding.
<a onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fwww.elastic.co%2Fguide%2Fen%2Felasticsearch%2Fguide%2Fcurrent%2Frelevance-is-broken.html\46sa\75D\46sntz\0751\46usg\75AFQjCNEIG6gPzDPq0gDwvMhv2lwbA9VPlw';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fwww.elastic.co%2Fguide%2Fen%2Felasticsearch%2Fguide%2Fcurrent%2Frelevance-is-broken.html\46sa\75D\46sntz\0751\46usg\75AFQjCNEIG6gPzDPq0gDwvMhv2lwbA9VPlw';return true;" href="https://www.elastic.co/guide/en/elasticsearch/guide/current/relevance-is-broken.html" target="_blank" rel="nofollow">https://www.elastic.co/guide/en/elasticsearch/guide/current/relevance-is-broken.html

Second
docFreq reflects this terms actual document frequency (how many documents does the term occur in)
maxDocs reflects the total number of documents on this shard

Third
maxDocs (and docFreq) do not reflect deletions. 

Lastly,
I presume you can find the documents you think you're adding in the index?

Hope that helps
-Doug

On Tue, Mar 10, 2015 at 9:46 PM, Xudong You <<a onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;" href="javascript:" target="_blank" rel="nofollow" gdf-obfuscated-mailto="Dg48jagGdnQJ">xudon...@...> wrote:
Thanks!
I tried the explain and better understand how the score comes. But still has question on the IDF score, the IDF in the explain output of my query is:
{
  "value": 0.30685282,
  "description": "idf(docFreq=1, maxDocs=1)"
}

What does docFreq and maxDocs in above mean? Per the IDF definition, the score should be affected by the total number of documents in the index, but seems the value is always 0.30685282 no matter how many docs I inserted to the index.


On Tuesday, March 10, 2015 at 5:39:56 PM UTC+8, Nhật Quang Phan wrote:

You can enable explain for your query and see how elasticsearch calculates score:

{
"explain": true,
"query": {
"match": {
"title": "xbox"
}
}
}

On Tuesday, March 10, 2015 at 3:15:50 PM UTC+7, Xudong You wrote:
I have two documents as follows:

1.
{
"title":"xbox"
}

2.
{
"title":"xbox xbox xbox"
}

Then I search the documents with following query:
{
"query":{"match":{"title":"xbox"}}
}

ES returns result as follows:
{"took":133,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":0.30685282,
"hits":[
{"_index":"storetest1","_type":"type","_id":"1","_score":0.30685282,"_source":{"title":"xbox","keywords":["xbox"]}},

{"_index":"storetest1","_type":"type","_id":"2","_score":0.26574233,"_source":{"title":"xbox xbox xbox","keywords":["xbox"]}}]}}


My question is, why #1 got higher score than #2? I thought #2 is higher than #1, since more xbox appear in title of #1.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;" href="javascript:" target="_blank" rel="nofollow" gdf-obfuscated-mailto="Dg48jagGdnQJ">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/3ab150a6-eccc-4145-a7e9-af16f3ff6752%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/3ab150a6-eccc-4145-a7e9-af16f3ff6752%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" href="https://groups.google.com/d/msgid/elasticsearch/3ab150a6-eccc-4145-a7e9-af16f3ff6752%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" rel="nofollow">https://groups.google.com/d/msgid/elasticsearch/3ab150a6-eccc-4145-a7e9-af16f3ff6752%40googlegroups.com.

For more options, visit <a onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;" href="https://groups.google.com/d/optout" target="_blank" rel="nofollow">https://groups.google.com/d/optout.



--
Doug Turnbull
Search Relevance Lead
<a onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fo19s.com\46sa\75D\46sntz\0751\46usg\75AFQjCNEDoThL2vrmhscBJPc34AzGJUhXMA';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fo19s.com\46sa\75D\46sntz\0751\46usg\75AFQjCNEDoThL2vrmhscBJPc34AzGJUhXMA';return true;" href="http://o19s.com" target="_blank" rel="nofollow">OpenSource Connections

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2e18efff-0b66-41b0-98e6-1eb73bde6896%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.