|
This post has NOT been accepted by the mailing list yet.
hi everybody
I have an index for keeping book records such as;
ElasticSearch Cookbook
ElasticSearch Server
Mastering ElasticSearch
ElasticSearch
i have more than 2M records.
search cases:
search term --- expected result --- (case)
---------------------------------------------------------
elastic cook --- ElasticSearch Cookbook --- (partial match)
ElasticSearhCookBook --- ElasticSearch Cookbook --- (no space)
ekasticsearch --- ElasticSearch --- (typo)
etc.
My index settings and mappings are as below;
Analyzer:
I have 5 analyzer.
edge_nGram_no_split_field: use edge n gram on whole search term(minNGram= 1,maxNGram=25 )
edge_nGram_token_field: use edge n gram on each search term token(minNGram= 2,maxNGram= 15)
nGram_no_space_field: remove whitespace from search term then use n gram (minNGram=3,maxNGram=4)
exact_field: look for exact match for seach term
token_field: look exact match for each token of seach term
{
"book": {
"settings": {
"index": {
"analysis": {
"filter": {
"edgeNGramFilter": {
"type": "edgeNGram",
"min_gram": "2",
"max_gram": "15"
}
},
"char_filter": {
"noSpace": {
"type": "pattern_replace",
"pattern": " ",
"replacement": ""
},
"quotes": {
"pattern": "'",
"type": "pattern_replace",
"replacement": ""
}
},
"analyzer": {
"edge_nGram_no_split_field": {
"filter": [
"lowercase",
"asciifolding"
],
"char_filter": [
"quotes"
],
"type": "custom",
"tokenizer": "no_split_edge_nGram"
},
"edge_nGram_token_field": {
"filter": [
"lowercase",
"asciifolding",
"edgeNGramFilter"
],
"char_filter": [
"quotes"
],
"type": "custom",
"tokenizer": "standard"
},
"nGram_no_space_field": {
"filter": [
"lowercase",
"asciifolding"
],
"char_filter": [
"quotes",
"noSpace"
],
"type": "custom",
"tokenizer": "no_space_nGram"
},
"exact_field": {
"filter": [
"lowercase",
"asciifolding"
],
"type": "custom",
"tokenizer": "keyword"
},
"token_field": {
"filter": [
"lowercase",
"asciifolding"
],
"char_filter": [
"quotes"
],
"type": "custom",
"tokenizer": "standard"
}
},
"tokenizer": {
"no_space_nGram": {
"type": "nGram",
"min_gram": "3",
"max_gram": "4"
},
"no_split_edge_nGram": {
"type": "edgeNGram",
"min_gram": "1",
"max_gram": "15"
}
}
},
"number_of_shards": "2",
"number_of_replicas": "1",
"version": {
"created": "1030499"
},
"uuid": "BuWYNc9LQbeDU7GHEUeAQw"
}
}
}
}
Mapping:
{
"book": {
"mappings": {
"en": {
"properties": {
"id": {
"type": "string"
},
"name.nGramNoSpace": {
"type": "string",
"norms": {
"enabled": false
},
"analyzer": "nGram_no_space_field"
},
"name.edgeNGram": {
"type": "string",
"norms": {
"enabled": false
},
"analyzer": "edge_nGram_token_field"
},
"name.edgeNGramNoSplit": {
"type": "string",
"norms": {
"enabled": false
},
"analyzer": "edge_nGram_no_split_field"
},
"name.exact": {
"type": "string",
"analyzer": "exact_field"
},
"name.token": {
"type": "string",
"norms": {
"enabled": false
},
"analyzer": "token_field"
}
}
}
}
}
}
My query is
minimum should match is calculated by following formula:
minimum_should_match = lenght of search term * 0.25
{
"bool" : {
"must" : {
"match" : {
"name" : {
"query" : "elastic cook",
"type" : "phrase",
"operator" : "OR",
"fuzziness" : "1",
"max_expansions" : 4,
"minimum_should_match" : "2",
"cutoff_frequency" : 0.01
}
}
},
"should" : [ {
"match" : {
"name.exact" : {
"query" : "elastic cook",
"type" : "phrase",
"boost" : 4.0
}
}
}, {
"match" : {
"name.token" : {
"query" : "elastic cook",
"type" : "phrase"
}
}
}, {
"match" : {
"name.edgeNGramNoSplit" : {
"query" : "elastic cook",
"type" : "phrase",
"boost" : 4.0,
"fuzziness" : "1",
"max_expansions" : 8
}
}
}, {
"match" : {
"name.edgeNGram" : {
"query" : "elastic cook",
"type" : "phrase",
"fuzziness" : "1",
"max_expansions" : 4
}
}
} ]
}
}
with this settings and with this query response times are approximately like below
length of search term -- response time(ms)
3 -- 120
4 -- 130
5 -- 140
6 -- 150
7 -- 165
8 -- 195
9 -- 225
10 -- 270
11 -- 350
12 -- 400
13 -- 450
14 -- 600
15 -- 700
As i mention, i have more than 2M record. is this response time is normal?
Or am i doing something wrong?
|