Quantcast

Elasticsearch : Custom Scoring for Score Normalization

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Elasticsearch : Custom Scoring for Score Normalization

vishnuchilamakuru
This post has NOT been accepted by the mailing list yet.
Hi All,

I am looking to implement custom scoring in elastic search. My main target is to normalize the Elastic search scores for each sub query and then final query.

Currently we do search on two fields
"title" - title of the document
"content" - description field of the document.

Also we want to do search ranking as follows.

Ex : If search query is "Samsung Mobile" we want to do scoring as follows.

Title Phrase Query with boost say 200 - title : "samsung mobile" ^ 200
Title All Terms Match with boost 100 - (title : samsung AND title : mobile )^ 100
Content Phrase Query with boost say 30 - content : "samsung mobile" ^ 30
Content All Terms Match with boost 20 - (content : samsung AND content : mobile )^ 20
Title One of the Terms Matching with boost 10 - (title : samsung OR title : mobile )^ 100

Also on addition to these we want to have predicted/normalized score ranges for each of these queries. Lets say

Title Phrase Query -  [ some min value  to  200 ]
Title All Terms Match - [some min value  to  100]
Content Phrase Query - [ some min value  to  30]
Content All Terms Match - [ some min value  to  20]
Title One of the Terms Matching - [some min value  to  10]

So we are trying to have something int the form of (ConstantScore * IDF * QueryNorm)

QueryNorm Is optional to have Since it is same for single unique query. As of now for our usecase we dont compare two different queries. So it can be optional.

I tried following two approaches.

Constant Score Query Approach


So to achieve this i tried constant score query. But the problem is it don't consider IDF during score calculation.

TF-IDF Approach


I tried TF-IDF approach also by setting fieldLengthNorm = 1 and termfrequency = 1 . But we found FieldWeight = (TF * IDF * FieldlengthNorm)

So Each Query/Clause Score = QueryWeight * FieldWeight = (boost * IDF * queryNorm) * (1 * IDF * 1) = boost * queryNorm * (IDF)^2

So Here we want to have IDF to get multipled only once and  also want to know to turn of querynorm if possible.

Please suggest me how to do this. If i missed out any key points please suggest me to go in right direction for normalizing Scores.






Loading...