edgeNGram weirdness

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

edgeNGram weirdness

Axsuul
Hi,

I'm having trouble getting a edgengram query to behave properly. I have one record "blue grass" with an edgengram minimum of 2. A query string of "blv" however returns "blue grass" although it shouldn't.

curl -X POST http://localhost:9200/test -d '{
    "mappings": {
        "product/fragrance": {
            "properties": {
                "name_query": {
                    "index_analyzer": "query_index_analyzer",
                    "search_anaylzer": "query_search_analyzer",
                    "as": {},
                    "type": "string"
                }
            }
        }
    },
    "settings": {
        "analysis": {
            "filter": {
                "query_edgengram": {
                    "type": "edgeNGram",
                    "min_gram": 2,
                    "max_gram": 20,
                    "side": "front"
                }
            },
            "analyzer": {
                "query_index_analyzer": {
                    "tokenizer": "lowercase",
                    "filter": ["asciifolding", "query_edgengram"]
                },
                "query_search_analyzer": {
                    "tokenizer": "lowercase",
                    "filter": ["asciifolding"]
                }
            }
        }
    }
}'

curl -X POST "http://localhost:9200/test/product%2Ffragrance/1" -d '{
    "name_query": "blue grass"
}'

curl -X GET "http://localhost:9200/test/product%2Ffragrance/_search?load=true&pretty=true" -d '{
    "query": {
        "bool": {
            "must": [{
                "query_string": {
                    "query": "blv",
                    "fields": ["name_query"],
                    "default_operator": "OR"
                }
            }]
        }
    }
}'

For some reason, I get a result from that. Can anyone explain why? Thanks. What I want to happen is "blv" shouldn't be returning "blue grass" although "bl" should. I've used the analyze API and see "blue grass"  being broken down to "bl", "blu", "blue", "gr", "gra", "gras", "grass" but "blv" doesn't match any of those.
Reply | Threaded
Open this post in threaded view
|

Re: edgeNGram weirdness

dadoonet
I answered on stackoverflow.

http://stackoverflow.com/questions/12909844/query-string-returning-results-not-found-in-edgengram/12911582#12911582
--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 16 oct. 2012 à 10:39, Axsuul <[hidden email]> a écrit :

> Hi,
>
> I'm having trouble getting a edgengram query to behave properly. I have one
> record "blue grass" with an edgengram minimum of 2. A query string of "blv"
> however returns "blue grass" although it shouldn't.
>
> curl -X POST http://localhost:9200/test -d '{
>    "mappings": {
>        "product/fragrance": {
>            "properties": {
>                "name_query": {
>                    "index_analyzer": "query_index_analyzer",
>                    "search_anaylzer": "query_search_analyzer",
>                    "as": {},
>                    "type": "string"
>                }
>            }
>        }
>    },
>    "settings": {
>        "analysis": {
>            "filter": {
>                "query_edgengram": {
>                    "type": "edgeNGram",
>                    "min_gram": 2,
>                    "max_gram": 20,
>                    "side": "front"
>                }
>            },
>            "analyzer": {
>                "query_index_analyzer": {
>                    "tokenizer": "lowercase",
>                    "filter": ["asciifolding", "query_edgengram"]
>                },
>                "query_search_analyzer": {
>                    "tokenizer": "lowercase",
>                    "filter": ["asciifolding"]
>                }
>            }
>        }
>    }
> }'
>
> curl -X POST "http://localhost:9200/test/product%2Ffragrance/1" -d '{
>    "name_query": "blue grass"
> }'
>
> curl -X GET
> "http://localhost:9200/test/product%2Ffragrance/_search?load=true&pretty=true"
> -d '{
>    "query": {
>        "bool": {
>            "must": [{
>                "query_string": {
>                    "query": "blv",
>                    "fields": ["name_query"],
>                    "default_operator": "OR"
>                }
>            }]
>        }
>    }
> }'
>
> For some reason, I get a result from that. Can anyone explain why? Thanks.
> What I want to happen is "blv" shouldn't be returning "blue grass" although
> "bl" should. I've used the analyze API and see "blue grass"  being broken
> down to "bl", "blu", "blue", "gr", "gra", "gras", "grass" but "blv" doesn't
> match any of those.
>
>
>
> --
> View this message in context: http://elasticsearch-users.115913.n3.nabble.com/edgeNGram-weirdness-tp4024036.html
> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>
> --
>
>

--