relevance in the range 0.0 to 1.0 ?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

relevance in the range 0.0 to 1.0 ?

Dustin Boswell
Is there a way to score documents so that the relevance score has a fixed range, like from 0 to 1.0 ? The default scoring can return arbitrarily high scores, depending on how many times the matching term appears in the document.

It's tempting to want to normalize the score by the top-matching document, but this is wrong since the top document isn't always a perfect match.

Are there other built-in scorers, or parameter settings that will do this?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dbf1f137-b160-4e30-94b7-1cc9b8fb939e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: relevance in the range 0.0 to 1.0 ?

simonw-2
I think you should read this https://wiki.apache.org/lucene-java/ScoresAsPercentages

it might help you to make a point.

simon

On Wednesday, November 5, 2014 8:42:59 PM UTC+1, Dustin Boswell wrote:
Is there a way to score documents so that the relevance score has a fixed range, like from 0 to 1.0 ? The default scoring can return arbitrarily high scores, depending on how many times the matching term appears in the document.

It's tempting to want to normalize the score by the top-matching document, but this is wrong since the top document isn't always a perfect match.

Are there other built-in scorers, or parameter settings that will do this?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6dc24bd4-f96b-46ca-8679-88846bf60064%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: relevance in the range 0.0 to 1.0 ?

Dustin Boswell
Glad to know lots of other people have been asking for it too :)

I agree that dividing the default relevance score by some constant (or some number derived from the results) is a bad idea, for all the reasons that article describes.

I was hoping there was a non-default scorer that is built to return 0-1.0 scores by design.   At my company we have a home-grown search engine that returns relevance scores in this range, and it works great. (Maybe I could discuss the algorithm further with the team offline, it's pretty good.)  We're looking to use elasticsearch for some of our applications, and this feature would help.

I guess I could go down the road of writing a custom scoring algorithm (in Java?) but not sure how much of an undertaking that is...



On Thursday, November 6, 2014 11:11:23 AM UTC-8, simonw wrote:
I think you should read this <a href="https://wiki.apache.org/lucene-java/ScoresAsPercentages" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fwiki.apache.org%2Flucene-java%2FScoresAsPercentages\46sa\75D\46sntz\0751\46usg\75AFQjCNEQwdcERfiIXeeR3KFyWYdqyeNHbQ';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fwiki.apache.org%2Flucene-java%2FScoresAsPercentages\46sa\75D\46sntz\0751\46usg\75AFQjCNEQwdcERfiIXeeR3KFyWYdqyeNHbQ';return true;">https://wiki.apache.org/lucene-java/ScoresAsPercentages

it might help you to make a point.

simon

On Wednesday, November 5, 2014 8:42:59 PM UTC+1, Dustin Boswell wrote:
Is there a way to score documents so that the relevance score has a fixed range, like from 0 to 1.0 ? The default scoring can return arbitrarily high scores, depending on how many times the matching term appears in the document.

It's tempting to want to normalize the score by the top-matching document, but this is wrong since the top document isn't always a perfect match.

Are there other built-in scorers, or parameter settings that will do this?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6c2507f3-a02d-4691-bb6c-0b027bd4e7e2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.