Elasticsearch seacrh results influenced by the number of records (size)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Elasticsearch seacrh results influenced by the number of records (size)

saikris
This post has NOT been accepted by the mailing list yet.
I'm experiencing an interesting discrepancy while searching on person names using elasticsearch version 1.7.3

My index has 38 million human names and 8 shards across 2 nodes with 1 replica.

My indexed data has got a name called: BERTIE RICHARD BUNCE

My search query:   BERTIE BUNCE
I'm using the following query and performing a DFS search as search type:


{
  "explain": true,
  "from": 0,
  "size": 10,
  "filter": {
    "bool": {
      "must": [
      ]
    }
  },
  "query": {
    "filtered": {
      "query": {
        "bool": {
          "should": [
            {
              "dis_max": {
                "tie_breaker": 0.7,
                "boost": 1.2,
                "queries": [
                  {
                    "match": {
                      "FullName_PHONETIC": {
                        "query": "Bertie Bunce",
                        "boost": 100.0
                      }
                    }
                  },
                  {
                    "match": {
                      "FullName_MUNGED_PHONETIC": {
                        "query": "Bertie Bunce",
                        "boost": 100.0
                      }
                    }
                  }
                ]
              }
            },
            {
              "match": {
                "FirstName_NGRAM": {
                  "query": "Bertie Bunce",
                  "boost": 100.0
                }
              }
            }
          ]
        }
      }
    }
  }
}
When I search using the above query for records size upto 45, I get "BERTIE BOYANCE" as the top search result (with the score 7.5). But when I increase the no of records to 46, I start seeing "BERTIE BUNCE" at the top with the score as 10. This is what I think I should see regardless of the record size.

Why is the record size influencing the search/score?

Please help.
Loading...