Hi Mike
On Fri, 2012-09-14 at 15:11 -0700, Mike wrote:
> I have my string field index_analyzed with nGrams, and I can't seem to
> get phrase matching using " " in my search text to work. Other things
> like fuzzy matching with ~, combining words with && and ||, boosting
> with ^ work fine though. Am I doing something wrong, or does phrase
> matching not work with ngrams?
Phrase matching does work with ngrams, but: there is a long-standing bug
in the edge-ngram analyzer in lucene which outputs different token
positions to the standard tokenizer.
So if you analyze the field with edge-ngrams and you do a phrase-search
on the field using the SAME analyzer, then it will work. But you are
using the standard tokenizer at search time, not the edge-ngram
tokenizer.
clint
>
> My mapping:
> "properties" : {
>
> "myquery" : {
>
> "type" : "multi_field",
>
> "fields" : {
>
> "myquery" : { "type" : "string",
> "index_analyzer" : "myAnalyzer", "search_analyzer" : "myAnalyzer2" },
>
> "myqueryUntouched" : { "type" : "string",
> "index" : "not_analyzed" }
> }
>
> },
> ...
>
> My settings:
> "analysis" : {
>
> "analyzer" : {
>
> "myAnalyzer" : {
>
> "tokenizer" : "standard",
>
> "filter" : ["standard", "lowercase", "stop",
> "myNGram"]
> },
>
> "myAnalyzer2" : {
>
> "tokenizer" : "standard",
>
> "filter" : ["standard", "lowercase", "stop"]
>
> }
>
> },
>
> "filter" : {
>
> "myNGram" : {
>
> "type" : "nGram",
>
> "min_gram" : 1,
>
> "max_gram" : 8
>
> }
>
> }
>
>
> My query:
> "query":{
> "query_string":{
> "default_field":"myquery",
> "default_operator":"AND",
> "query":"\"ibm eps\""
> }
> }
>
>
> If I remove the escaped " ", I get many results as I expect, like:
> ibm eps
> ibm q2 eps
> ibm 2001 eps
>
> If someone adds " " though I want only the ibm eps results.
>
> --
>
>
--