keyword tokenizer

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

keyword tokenizer

paul
My mapping looks as below

 "autocomplete_index":{
               "type":"custom",
               "tokenizer":"keyword",
               "filter":[
                  "lowercase",
 "syns_filter",
                  "my_edgeNgram"
               ]
            }

Now when i analyze the configuration using analyze api the word after space gets omitted . ie "university" is omitted

................../universityindextest2/_analyze?analyzer=autocomplete_index&text=yale%20university&pretty

output
----------

{
"tokens" : [ {
"token" : "ya",
"start_offset" : 0,
"end_offset" : 15,
"type" : "word",
"position" : 1
}, {
"token" : "yal",
"start_offset" : 0,
"end_offset" : 15,
"type" : "word",
"position" : 2
}, {
"token" : "yale",
"start_offset" : 0,
"end_offset" : 15,
"type" : "word",
"position" : 3
}, {
"token" : "yu",
"start_offset" : 0,
"end_offset" : 15,
"type" : "word",
"position" : 4
} ]
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d6bd7caa-b160-42ac-948c-6aab6884a51d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: keyword tokenizer

Binh Ly
Paul, Is it possible that your "syns_filter" is affecting your ngram filter? What happens when you remove the syns_filter?

On Wednesday, January 22, 2014 6:17:12 AM UTC-5, paul wrote:
My mapping looks as below

 "autocomplete_index":{
               "type":"custom",
               "tokenizer":"keyword",
               "filter":[
                  "lowercase",
 "syns_filter",
                  "my_edgeNgram"
               ]
            }

Now when i analyze the configuration using analyze api the word after space gets omitted . ie "university" is omitted

................../universityindextest2/_analyze?analyzer=autocomplete_index&text=yale%20university&pretty

output
----------

{
"tokens" : [ {
"token" : "ya",
"start_offset" : 0,
"end_offset" : 15,
"type" : "word",
"position" : 1
}, {
"token" : "yal",
"start_offset" : 0,
"end_offset" : 15,
"type" : "word",
"position" : 2
}, {
"token" : "yale",
"start_offset" : 0,
"end_offset" : 15,
"type" : "word",
"position" : 3
}, {
"token" : "yu",
"start_offset" : 0,
"end_offset" : 15,
"type" : "word",
"position" : 4
} ]
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/423e6c0f-0aa2-4f48-a357-a313905fb8c0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: keyword tokenizer

paul
Binh , When i removed the syns_filter its still the same but when i changed the   "tokenizer":"keyword", to "whitespcae" it taking "university" into account. May be its a tokenizer problem , when there is a space the keyword tokenizer is omitting the word after space.

-paul


On Wed, Jan 22, 2014 at 11:00 PM, Binh Ly <[hidden email]> wrote:
Paul, Is it possible that your "syns_filter" is affecting your ngram filter? What happens when you remove the syns_filter?


On Wednesday, January 22, 2014 6:17:12 AM UTC-5, paul wrote:
My mapping looks as below

 "autocomplete_index":{
               "type":"custom",
               "tokenizer":"keyword",
               "filter":[
                  "lowercase",
 "syns_filter",
                  "my_edgeNgram"
               ]
            }

Now when i analyze the configuration using analyze api the word after space gets omitted . ie "university" is omitted

................../universityindextest2/_analyze?analyzer=autocomplete_index&text=yale%20university&pretty

output
----------

{
"tokens" : [ {
"token" : "ya",
"start_offset" : 0,
"end_offset" : 15,
"type" : "word",
"position" : 1
}, {
"token" : "yal",
"start_offset" : 0,
"end_offset" : 15,
"type" : "word",
"position" : 2
}, {
"token" : "yale",
"start_offset" : 0,
"end_offset" : 15,
"type" : "word",
"position" : 3
}, {
"token" : "yu",
"start_offset" : 0,
"end_offset" : 15,
"type" : "word",
"position" : 4
} ]
}

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/inRyvJJDPpo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/423e6c0f-0aa2-4f48-a357-a313905fb8c0%40googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAO066G0Y%2BAoVt%2BN6q1bxr8KFN2A686U2Cp%3DyyEoHT_s41_vbzg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: keyword tokenizer

Binh Ly
Paul, yes you are correct, I missed that. The keyword tokenizer will take your entire string and make it into a single token - that's why it is not ngramming "university".

On Friday, January 24, 2014 12:03:34 AM UTC-5, paul wrote:
Binh , When i removed the syns_filter its still the same but when i changed the   "tokenizer":"keyword", to "whitespcae" it taking "university" into account. May be its a tokenizer problem , when there is a space the keyword tokenizer is omitting the word after space.

-paul


On Wed, Jan 22, 2014 at 11:00 PM, Binh Ly <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="9nXXZto09x0J" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">bi...@...> wrote:
Paul, Is it possible that your "syns_filter" is affecting your ngram filter? What happens when you remove the syns_filter?


On Wednesday, January 22, 2014 6:17:12 AM UTC-5, paul wrote:
My mapping looks as below

 "autocomplete_index":{
               "type":"custom",
               "tokenizer":"keyword",
               "filter":[
                  "lowercase",
 "syns_filter",
                  "my_edgeNgram"
               ]
            }

Now when i analyze the configuration using analyze api the word after space gets omitted . ie "university" is omitted

................../universityindextest2/_analyze?analyzer=autocomplete_index&text=yale%20university&pretty

output
----------

{
"tokens" : [ {
"token" : "ya",
"start_offset" : 0,
"end_offset" : 15,
"type" : "word",
"position" : 1
}, {
"token" : "yal",
"start_offset" : 0,
"end_offset" : 15,
"type" : "word",
"position" : 2
}, {
"token" : "yale",
"start_offset" : 0,
"end_offset" : 15,
"type" : "word",
"position" : 3
}, {
"token" : "yu",
"start_offset" : 0,
"end_offset" : 15,
"type" : "word",
"position" : 4
} ]
}

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit <a href="https://groups.google.com/d/topic/elasticsearch/inRyvJJDPpo/unsubscribe" target="_blank" onmousedown="this.href='https://groups.google.com/d/topic/elasticsearch/inRyvJJDPpo/unsubscribe';return true;" onclick="this.href='https://groups.google.com/d/topic/elasticsearch/inRyvJJDPpo/unsubscribe';return true;">https://groups.google.com/d/topic/elasticsearch/inRyvJJDPpo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="9nXXZto09x0J" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/423e6c0f-0aa2-4f48-a357-a313905fb8c0%40googlegroups.com" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/423e6c0f-0aa2-4f48-a357-a313905fb8c0%40googlegroups.com';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/423e6c0f-0aa2-4f48-a357-a313905fb8c0%40googlegroups.com';return true;">https://groups.google.com/d/msgid/elasticsearch/423e6c0f-0aa2-4f48-a357-a313905fb8c0%40googlegroups.com.

For more options, visit <a href="https://groups.google.com/groups/opt_out" target="_blank" onmousedown="this.href='https://groups.google.com/groups/opt_out';return true;" onclick="this.href='https://groups.google.com/groups/opt_out';return true;">https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0bc9516f-1830-4f70-a25b-276a9b43ddac%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.