trivial example of keyword_repeat?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

trivial example of keyword_repeat?

Nikita Tovstoles
Could someone please share a trivial example of using Keyword Repeat Token Filter to: “[Emit] each incoming token twice once as keyword and once as a non-keyword to allow an un-stemmed version of a term to be indexed side by si[d]e to the stemmed version of the term" 

Maybe I don’t understand its’ intent but is the idea to be able to tokenize string “one two” into “one”, “two”, “one two” (last being unstemmed), right?

If so, I tried example config (in the docs) and input was not preserved:

http://localhost:9200/_analyze?pretty=true&text=one%20two&filters=lowercase,keyword_repeat,porter_stem,unique

{
"tokens" : [ {
"token" : "one",
"start_offset" : 0,
"end_offset" : 3,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "two",
"start_offset" : 4,
"end_offset" : 7,
"type" : "<ALPHANUM>",
"position" : 2
} ]
}

Shouldn't there be 3 tokens - “one”, “two”, “one two”?

…using v 1.0.1

BTW, seems to make more sense to use ‘keyword’ tokenizer instead of ‘standard’ (since latter splits “one two” before filter is even enacted). but that fails to return “one”, and “two”

http://localhost:9200/_analyze?pretty=true&text=one%20two&filters=lowercase,keyword_repeat,porter_stem,unique&tokenizer=keyword
{
  "tokens" : [ {
    "token" : "one two",
    "start_offset" : 0,
    "end_offset" : 7,
    "type" : "word",
    "position" : 1
  } ]
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/93d58a7d-07c1-485f-afee-3c2f1e9b994f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: trivial example of keyword_repeat?

simonw-2
the repeat filter only applies to terms that actually get stemmed. ie if you have "goes" it will be stemmed to "go" but with the repeat filter it will also emit "goes" in addition to "go"

makes sense?

simon

On Thursday, March 13, 2014 12:38:00 AM UTC+1, Nikita Tovstoles wrote:
Could someone please share a trivial example of using <a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-keyword-repeat-tokenfilter.html" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.elasticsearch.org%2Fguide%2Fen%2Felasticsearch%2Freference%2Fcurrent%2Fanalysis-keyword-repeat-tokenfilter.html\46sa\75D\46sntz\0751\46usg\75AFQjCNEhKcXyQWGO1r5mRsMPVN0utA0yDg';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.elasticsearch.org%2Fguide%2Fen%2Felasticsearch%2Freference%2Fcurrent%2Fanalysis-keyword-repeat-tokenfilter.html\46sa\75D\46sntz\0751\46usg\75AFQjCNEhKcXyQWGO1r5mRsMPVN0utA0yDg';return true;">Keyword Repeat Token Filter to: “[Emit] each incoming token twice once as keyword and once as a non-keyword to allow an un-stemmed version of a term to be indexed side by si[d]e to the stemmed version of the term" 

Maybe I don’t understand its’ intent but is the idea to be able to tokenize string “one two” into “one”, “two”, “one two” (last being unstemmed), right?

If so, I tried example config (in the docs) and input was not preserved:

<a href="http://localhost:9200/_analyze?pretty=true&amp;text=one%20two&amp;filters=lowercase,keyword_repeat,porter_stem,unique" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Flocalhost%3A9200%2F_analyze%3Fpretty%3Dtrue%26text%3Done%2520two%26filters%3Dlowercase%2Ckeyword_repeat%2Cporter_stem%2Cunique\46sa\75D\46sntz\0751\46usg\75AFQjCNEhZf3xWfODw8wOsNQgYmGmWQpMiA';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Flocalhost%3A9200%2F_analyze%3Fpretty%3Dtrue%26text%3Done%2520two%26filters%3Dlowercase%2Ckeyword_repeat%2Cporter_stem%2Cunique\46sa\75D\46sntz\0751\46usg\75AFQjCNEhZf3xWfODw8wOsNQgYmGmWQpMiA';return true;">http://localhost:9200/_analyze?pretty=true&text=one%20two&filters=lowercase,keyword_repeat,porter_stem,unique

{
"tokens" : [ {
"token" : "one",
"start_offset" : 0,
"end_offset" : 3,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "two",
"start_offset" : 4,
"end_offset" : 7,
"type" : "<ALPHANUM>",
"position" : 2
} ]
}

Shouldn't there be 3 tokens - “one”, “two”, “one two”?

…using v 1.0.1

BTW, seems to make more sense to use ‘keyword’ tokenizer instead of ‘standard’ (since latter splits “one two” before filter is even enacted). but that fails to return “one”, and “two”

<a href="http://localhost:9200/_analyze?pretty=true&amp;text=one%20two&amp;filters=lowercase,keyword_repeat,porter_stem,unique&amp;tokenizer=keyword" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Flocalhost%3A9200%2F_analyze%3Fpretty%3Dtrue%26text%3Done%2520two%26filters%3Dlowercase%2Ckeyword_repeat%2Cporter_stem%2Cunique%26tokenizer%3Dkeyword\46sa\75D\46sntz\0751\46usg\75AFQjCNE0CtxvI410LhILBQAI9SM__OWaQw';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Flocalhost%3A9200%2F_analyze%3Fpretty%3Dtrue%26text%3Done%2520two%26filters%3Dlowercase%2Ckeyword_repeat%2Cporter_stem%2Cunique%26tokenizer%3Dkeyword\46sa\75D\46sntz\0751\46usg\75AFQjCNE0CtxvI410LhILBQAI9SM__OWaQw';return true;">http://localhost:9200/_analyze?pretty=true&text=one%20two&filters=lowercase,keyword_repeat,porter_stem,unique&tokenizer=keyword
{
  "tokens" : [ {
    "token" : "one two",
    "start_offset" : 0,
    "end_offset" : 7,
    "type" : "word",
    "position" : 1
  } ]
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ff101eaa-e5e0-45cb-a071-bbe118c8a756%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.