Completion Suggester and Analyzer

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Completion Suggester and Analyzer

Paweł Młynarczyk
Hello

I'm trying out the new completion suggester feature. 
I'm using simple analyzer to analyze at both index and search time. I have "nirvana nevermind" as input for completion and still starting completion term with "never" does not return anything. I've expected this to work since analyzer splits "nirvana nevermind" into two separate tokens?

I'm using example data from elasticsearch website:

curl -X PUT localhost:9200/music
curl
-X PUT localhost:9200/music/song/_mapping -d '{
  "song" : {
        "properties" : {
            "name" : { "type" : "string" },
            "suggest" : { "type" : "completion",
                          "index_analyzer" : "simple",
                          "search_analyzer" : "simple",
                          "payloads" : true
            }
        }
    }
}'

but I've changed the indexed item a bit:

curl -X PUT 'localhost:9200/music/song/1?refresh=true' -d '{
    "name" : "Nevermind",
    "suggest" : {
        "input": [ "nirvana nevermind" ],
        "output": "Nirvana - Nevermind"

    }
}'

And this query doesn't return anything:

curl -X POST 'localhost:9200/music/_suggest?pretty' -d '{
    "song-suggest" : {
        "text" : "never",
        "completion" : {
            "field" : "suggest"
        }
    }
}'

I know I can handle this by just adding more inputs, but I am concerned about the size of the index, when the list of possible user inputs for an item goes huge...

Is there a way to analyze terms to match my expectations?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Completion Suggester and Analyzer

Alexander Reelsen-2
Hey Pawel,

right now the suggester is a pure prefix suggester, this means the term you indexed was "Nirvana - Nevermind", so you only get suggestions back, when you enter "Nirv". So as a workaround you could index several inputs like "Nirvana" and "Nevermind". So

"input": [ "Nirvana", "Nevermind" ],
"output" : "Nirvana - Nevermind"

would make your usecase work. Also in case you are afraid of the size, you can easily monitor by field using the nodes stats API, see more at:


From a long term point of view it makes sense to support the AnalyzingInfixSuggester from Lucene as well.

Hope this helps.


--Alex



On Mon, Oct 7, 2013 at 3:23 PM, Paweł Młynarczyk <[hidden email]> wrote:
Hello

I'm trying out the new completion suggester feature. 
I'm using simple analyzer to analyze at both index and search time. I have "nirvana nevermind" as input for completion and still starting completion term with "never" does not return anything. I've expected this to work since analyzer splits "nirvana nevermind" into two separate tokens?

I'm using example data from elasticsearch website:

curl -X PUT localhost:9200/music
curl
-X PUT localhost:9200/music/song/_mapping -d '{
  "song" : {
        "properties" : {
            "name" : { "type" : "string" },
            "suggest" : { "type" : "completion",
                          "index_analyzer" : "simple",
                          "search_analyzer" : "simple",
                          "payloads" : true
            }
        }
    }
}'

but I've changed the indexed item a bit:

curl -X PUT 'localhost:9200/music/song/1?refresh=true' -d '{
    "name" : "Nevermind",
    "suggest" : {
        "input": [ "nirvana nevermind" ],
        "output": "Nirvana - Nevermind"

    }
}'

And this query doesn't return anything:

curl -X POST 'localhost:9200/music/_suggest?pretty' -d '{
    "song-suggest" : {
        "text" : "never",
        "completion" : {
            "field" : "suggest"
        }
    }
}'

I know I can handle this by just adding more inputs, but I am concerned about the size of the index, when the list of possible user inputs for an item goes huge...

Is there a way to analyze terms to match my expectations?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Completion Suggester and Analyzer

simonw-2
In reply to this post by Paweł Młynarczyk
This suggester is in-fact a prefix suggester. it will only operate on the prefixes you are adding and it will complete them.
You said you are afraid of the size of the index - I can assure you this one takes you extremely far without being an issue. The compression for this kind stuff is immense and I have personal experience with the exact same problems. Don't worry too much about the index size here unless you have tens of billions of records with many different prefixes. If you have stuff like <artist> <song_name> you can easily have the combinations [ <artist>-<song_name>, <song_name>, <song_name>-<artist>] without issues. We are working on solutions that help with these situations but they won't use less space.

simon

On Monday, October 7, 2013 3:23:39 PM UTC+2, Paweł Młynarczyk wrote:
Hello

I'm trying out the new completion suggester feature. 
I'm using simple analyzer to analyze at both index and search time. I have "nirvana nevermind" as input for completion and still starting completion term with "never" does not return anything. I've expected this to work since analyzer splits "nirvana nevermind" into two separate tokens?

I'm using example data from elasticsearch website:

curl -X PUT localhost:9200/music
curl
-X PUT localhost:9200/music/song/_mapping -d '{
  "song" : {
        "properties" : {
            "name" : { "type" : "string" },
            "suggest" : { "type" : "completion",
                          "index_analyzer" : "simple",
                          "search_analyzer" : "simple",
                          "payloads" : true
            }
        }
    }
}'

but I've changed the indexed item a bit:

curl -X PUT 'localhost:9200/music/song/1?refresh=true' -d '{
    "name" : "Nevermind",
    "suggest" : {
        "input": [ "nirvana nevermind" ],
        "output": "Nirvana - Nevermind"

    }
}'

And this query doesn't return anything:

curl -X POST 'localhost:9200/music/_suggest?pretty' -d '{
    "song-suggest" : {
        "text" : "never",
        "completion" : {
            "field" : "suggest"
        }
    }
}'

I know I can handle this by just adding more inputs, but I am concerned about the size of the index, when the list of possible user inputs for an item goes huge...

Is there a way to analyze terms to match my expectations?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.