Similarity score in array

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Similarity score in array

Ban Mido
Hi ,

I have a field called tags which is an array of elements and i have applied a flat analyzer over it.
Now i want a array of tags and i want to see which feed matches the array of tags the most. The logic to similarity is that 

  1. The feed with maximum number of matching tags should come first.
  2. In case if 2 feeds have same amount of matched tag , then the feed with the highest percentage of matched tag should come.

In short if my query is "terms" : { "tags" : [ "one" , "two" ] } ,
then the feed  { "tags" : [ "one" , "two" , "three" , "four"]} should have greater similarity score over the feed { "tags" : [ "one" , "two" , "three" , "four" , "five" ] because for the latter feed the percentage of matched tags is 75% and for former percentage of matched tags is 60%.

Thanks
           Vineeth

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Similarity score in array

ppearcy
Hi,
  Doing a stock sort on score will get you most of the way there. However, it will not strictly adhere to the first rule since it is TF/IDF based. 

Implementing a custom score would definitely work:

Getting more out there, but likely more optimal, you could perhaps define your own similarity model. Something to play around with at least:

Best Regards,
Paul

On Sunday, September 22, 2013 1:56:35 PM UTC-4, Ban Mido wrote:
Hi ,

I have a field called tags which is an array of elements and i have applied a flat analyzer over it.
Now i want a array of tags and i want to see which feed matches the array of tags the most. The logic to similarity is that 

  1. The feed with maximum number of matching tags should come first.
  2. In case if 2 feeds have same amount of matched tag , then the feed with the highest percentage of matched tag should come.

In short if my query is "terms" : { "tags" : [ "one" , "two" ] } ,
then the feed  { "tags" : [ "one" , "two" , "three" , "four"]} should have greater similarity score over the feed { "tags" : [ "one" , "two" , "three" , "four" , "five" ] because for the latter feed the percentage of matched tags is 75% and for former percentage of matched tags is 60%.

Thanks
           Vineeth

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.