Field-length norm fails on fields with 3 and 4 words

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Field-length norm fails on fields with 3 and 4 words

Fil ES
Hello,

I am experiencing an very annoying behaviour of the elastic search score calculating algorithm - the field length fails to find a difference between fields which contain 3 and 4 words. Always return same score for both. Example:

LANCA HOTEL EXTREME  and  MASSIVE AMAZING HOTEL GROUP

would come back with the same field length and set the same score for field-length norm.

I did try using BM25 similarity instead of default one manipulating parameters, however the output would be always the same. 

Anybody got any idea why that would be happening? It is extremely annoying as most of fields in each document contain about 3-4 words.

Thank you,
Fil

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a007c1fc-a5c4-45f5-9f83-7f414831170b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Field-length norm fails on fields with 3 and 4 words

Ivan Brusic

The field norm is computed at index time and is stored in a single byte, which can lead to a loss in precision. This behavior might have changed with newer versions of Lucene, but probably not.

Ivan

On Apr 30, 2015 6:42 PM, "Fil ES" <[hidden email]> wrote:
Hello,

I am experiencing an very annoying behaviour of the elastic search score calculating algorithm - the field length fails to find a difference between fields which contain 3 and 4 words. Always return same score for both. Example:

LANCA HOTEL EXTREME  and  MASSIVE AMAZING HOTEL GROUP

would come back with the same field length and set the same score for field-length norm.

I did try using BM25 similarity instead of default one manipulating parameters, however the output would be always the same. 

Anybody got any idea why that would be happening? It is extremely annoying as most of fields in each document contain about 3-4 words.

Thank you,
Fil

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a007c1fc-a5c4-45f5-9f83-7f414831170b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQA2qwW9RAJ9NM_9kvWzfPkF7qxFHuLZaxGOphj%2BvjLA6A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.