Elasticsearch : Custom Scoring for Score Normalization
This post has NOT been accepted by the mailing list yet.
I am looking to implement custom scoring in elastic search. My main target is to normalize the Elastic search scores for each sub query and then final query.
Currently we do search on two fields
"title" - title of the document
"content" - description field of the document.
Also we want to do search ranking as follows.
Ex : If search query is "Samsung Mobile" we want to do scoring as follows.
Title Phrase Query with boost say 200 - title : "samsung mobile" ^ 200
Title All Terms Match with boost 100 - (title : samsung AND title : mobile )^ 100
Content Phrase Query with boost say 30 - content : "samsung mobile" ^ 30
Content All Terms Match with boost 20 - (content : samsung AND content : mobile )^ 20
Title One of the Terms Matching with boost 10 - (title : samsung OR title : mobile )^ 100
Also on addition to these we want to have predicted/normalized score ranges for each of these queries. Lets say
Title Phrase Query - [ some min value to 200 ]
Title All Terms Match - [some min value to 100]
Content Phrase Query - [ some min value to 30]
Content All Terms Match - [ some min value to 20]
Title One of the Terms Matching - [some min value to 10]
So we are trying to have something int the form of (ConstantScore * IDF * QueryNorm)
QueryNorm Is optional to have Since it is same for single unique query. As of now for our usecase we dont compare two different queries. So it can be optional.
I tried following two approaches.
Constant Score Query Approach
So to achieve this i tried constant score query. But the problem is it don't consider IDF during score calculation.
I tried TF-IDF approach also by setting fieldLengthNorm = 1 and termfrequency = 1 . But we found FieldWeight = (TF * IDF * FieldlengthNorm)