n edge gram analyzer's behave not as expected

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

n edge gram analyzer's behave not as expected

narinder.izap
Hi All,

I have an custom analyzer based on n edge gram analyzer, The expectation was it should analyze the text based on edges. So as per my understanding, the analysis of a multi word like (Narinder Kaur)term will give 
N
Na
Nar
Nari
Narin
Narind
Narinde
Narinder
K
Ka
Kau
Kaur

So now if search for narinder or kaur by using the following query:

{
  "query": {
    "constant_score": {
      "query": {
        "match_phrase_prefix": {
          "primary_search_new": {
            "query": "narinder",
            "analyzer": "ys_search_analyzer_long"
          }
        }
      }
    }
  }
}

OR

{
  "query": {
    "constant_score": {
      "query": {
        "match_phrase_prefix": {
          "primary_search_new": {
            "query": "kaur",
            "analyzer": "ys_search_analyzer_long"
          }
        }
      }
    }
  }
}

both should have searched for the documents containing "Narinder Kaur". But currently I can not search for kaur. Its working only for first term match. The analyzer's used are as followed:


analysis: {
analyzer: {
ys_search_analyzer: {
type: custom
filter: [
ys_word_delimiter
trim
lowercase
]
tokenizer: ys_edge_ngram_tokenizer
}
ys_search_analyzer_long: {
type: custom
filter: [
ys_word_delimiter
trim
lowercase
]
tokenizer: ys_edge_ngram_tokenizer_long
}
}
filter: {
ys_word_delimiter: {
type: word_delimiter
stem_english_possessive: False
}
}
tokenizer: {
ys_edge_ngram_tokenizer_long: {
type: edgeNGram
min_gram: 1
max_gram: 60
}
ys_edge_ngram_tokenizer: {
min_gram: 1
type: edgeNGram
max_gram: 20
}
}
}



Please elaborate how its not working as expected? and what should I do to make my requirement work without re-indexing the data.

All help is appreciated.
thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/76753cd7-7a47-4ca3-ba7b-90be025386b4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: n edge gram analyzer's behave not as expected

Masaru Hasegawa
Hi,

You'd need to specify token_chars when you configure edge ngram tokenizer(http://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenizer.html). Unless, all characters are kept. Which means, words are not split on white spaces.

You need to fix analyzer and re-index all documents.


Masaru

On March 25, 2015 at 17:49:24, Narinder Kaur ([hidden email]) wrote:

Hi All,

I have an custom analyzer based on n edge gram analyzer, The expectation was it should analyze the text based on edges. So as per my understanding, the analysis of a multi word like (Narinder Kaur)term will give 
N
Na
Nar
Nari
Narin
Narind
Narinde
Narinder
K
Ka
Kau
Kaur

So now if search for narinder or kaur by using the following query:

{
  "query": {
    "constant_score": {
      "query": {
        "match_phrase_prefix": {
          "primary_search_new": {
            "query": "narinder",
            "analyzer": "ys_search_analyzer_long"
          }
        }
      }
    }
  }
}

OR

{
  "query": {
    "constant_score": {
      "query": {
        "match_phrase_prefix": {
          "primary_search_new": {
            "query": "kaur",
            "analyzer": "ys_search_analyzer_long"
          }
        }
      }
    }
  }
}

both should have searched for the documents containing "Narinder Kaur". But currently I can not search for kaur. Its working only for first term match. The analyzer's used are as followed:


analysis: {
analyzer: {
ys_search_analyzer: {
type: custom
filter: [
ys_word_delimiter
trim
lowercase
]
tokenizer: ys_edge_ngram_tokenizer
}
ys_search_analyzer_long: {
type: custom
filter: [
ys_word_delimiter
trim
lowercase
]
tokenizer: ys_edge_ngram_tokenizer_long
}
}
filter: {
ys_word_delimiter: {
type: word_delimiter
stem_english_possessive: False
}
}
tokenizer: {
ys_edge_ngram_tokenizer_long: {
type: edgeNGram
min_gram: 1
max_gram: 60
}
ys_edge_ngram_tokenizer: {
min_gram: 1
type: edgeNGram
max_gram: 20
}
}
}



Please elaborate how its not working as expected? and what should I do to make my requirement work without re-indexing the data.

All help is appreciated.
thanks
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/76753cd7-7a47-4ca3-ba7b-90be025386b4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.5514982b.79e2a9e3.166%40citra-2.local.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: n edge gram analyzer's behave not as expected

narinder.izap
Thanks for your reply. It much better clear now how to 

On Friday, 27 March 2015 05:07:34 UTC+5:30, Masaru Hasegawa wrote:
Hi,

You'd need to specify token_chars when you configure edge ngram tokenizer(<a href="http://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenizer.html" target="_blank" rel="nofollow" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.elastic.co%2Fguide%2Fen%2Felasticsearch%2Freference%2Fcurrent%2Fanalysis-edgengram-tokenizer.html\46sa\75D\46sntz\0751\46usg\75AFQjCNEFh09p_WU9tgS5LWPpwF94-p4wHA';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.elastic.co%2Fguide%2Fen%2Felasticsearch%2Freference%2Fcurrent%2Fanalysis-edgengram-tokenizer.html\46sa\75D\46sntz\0751\46usg\75AFQjCNEFh09p_WU9tgS5LWPpwF94-p4wHA';return true;">http://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenizer.html). Unless, all characters are kept. Which means, words are not split on white spaces.
You can see how the analyzer works by _analyze API(<a href="http://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html" target="_blank" rel="nofollow" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.elastic.co%2Fguide%2Fen%2Felasticsearch%2Freference%2Fcurrent%2Findices-analyze.html\46sa\75D\46sntz\0751\46usg\75AFQjCNH3pZ5_uXb73QEQh-4gOodlB9rI9Q';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.elastic.co%2Fguide%2Fen%2Felasticsearch%2Freference%2Fcurrent%2Findices-analyze.html\46sa\75D\46sntz\0751\46usg\75AFQjCNH3pZ5_uXb73QEQh-4gOodlB9rI9Q';return true;">http://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html)

You need to fix analyzer and re-index all documents.


Masaru

On March 25, 2015 at 17:49:24, Narinder Kaur (<a href="javascript:" target="_blank" gdf-obfuscated-mailto="5QpPDJgZL5gJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">narind...@...) wrote:

Hi All,

I have an custom analyzer based on n edge gram analyzer, The expectation was it should analyze the text based on edges. So as per my understanding, the analysis of a multi word like (Narinder Kaur)term will give 
N
Na
Nar
Nari
Narin
Narind
Narinde
Narinder
K
Ka
Kau
Kaur

So now if search for narinder or kaur by using the following query:

{
  "query": {
    "constant_score": {
      "query": {
        "match_phrase_prefix": {
          "primary_search_new": {
            "query": "narinder",
            "analyzer": "ys_search_analyzer_long"
          }
        }
      }
    }
  }
}

OR

{
  "query": {
    "constant_score": {
      "query": {
        "match_phrase_prefix": {
          "primary_search_new": {
            "query": "kaur",
            "analyzer": "ys_search_analyzer_long"
          }
        }
      }
    }
  }
}

both should have searched for the documents containing "Narinder Kaur". But currently I can not search for kaur. Its working only for first term match. The analyzer's used are as followed:


analysis: {
analyzer: {
ys_search_analyzer: {
type: custom
filter: [
ys_word_delimiter
trim
lowercase
]
tokenizer: ys_edge_ngram_tokenizer
}
ys_search_analyzer_long: {
type: custom
filter: [
ys_word_delimiter
trim
lowercase
]
tokenizer: ys_edge_ngram_tokenizer_long
}
}
filter: {
ys_word_delimiter: {
type: word_delimiter
stem_english_possessive: False
}
}
tokenizer: {
ys_edge_ngram_tokenizer_long: {
type: edgeNGram
min_gram: 1
max_gram: 60
}
ys_edge_ngram_tokenizer: {
min_gram: 1
type: edgeNGram
max_gram: 20
}
}
}



Please elaborate how its not working as expected? and what should I do to make my requirement work without re-indexing the data.

All help is appreciated.
thanks
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="5QpPDJgZL5gJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/76753cd7-7a47-4ca3-ba7b-90be025386b4%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/76753cd7-7a47-4ca3-ba7b-90be025386b4%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/76753cd7-7a47-4ca3-ba7b-90be025386b4%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;"> https://groups.google.com/d/msgid/elasticsearch/76753cd7-7a47-4ca3-ba7b-90be025386b4%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/54f7657f-ecb2-459f-8947-913a678745b0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: n edge gram analyzer's behave not as expected

narinder.izap
In reply to this post by Masaru Hasegawa
thanks for reply. I will try it.

On Friday, 27 March 2015 05:07:34 UTC+5:30, Masaru Hasegawa wrote:
Hi,

You'd need to specify token_chars when you configure edge ngram tokenizer(<a href="http://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenizer.html" target="_blank" rel="nofollow" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.elastic.co%2Fguide%2Fen%2Felasticsearch%2Freference%2Fcurrent%2Fanalysis-edgengram-tokenizer.html\46sa\75D\46sntz\0751\46usg\75AFQjCNEFh09p_WU9tgS5LWPpwF94-p4wHA';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.elastic.co%2Fguide%2Fen%2Felasticsearch%2Freference%2Fcurrent%2Fanalysis-edgengram-tokenizer.html\46sa\75D\46sntz\0751\46usg\75AFQjCNEFh09p_WU9tgS5LWPpwF94-p4wHA';return true;">http://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenizer.html). Unless, all characters are kept. Which means, words are not split on white spaces.
You can see how the analyzer works by _analyze API(<a href="http://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html" target="_blank" rel="nofollow" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.elastic.co%2Fguide%2Fen%2Felasticsearch%2Freference%2Fcurrent%2Findices-analyze.html\46sa\75D\46sntz\0751\46usg\75AFQjCNH3pZ5_uXb73QEQh-4gOodlB9rI9Q';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.elastic.co%2Fguide%2Fen%2Felasticsearch%2Freference%2Fcurrent%2Findices-analyze.html\46sa\75D\46sntz\0751\46usg\75AFQjCNH3pZ5_uXb73QEQh-4gOodlB9rI9Q';return true;">http://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html)

You need to fix analyzer and re-index all documents.


Masaru

On March 25, 2015 at 17:49:24, Narinder Kaur (<a href="javascript:" target="_blank" gdf-obfuscated-mailto="5QpPDJgZL5gJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">narind...@...) wrote:

Hi All,

I have an custom analyzer based on n edge gram analyzer, The expectation was it should analyze the text based on edges. So as per my understanding, the analysis of a multi word like (Narinder Kaur)term will give 
N
Na
Nar
Nari
Narin
Narind
Narinde
Narinder
K
Ka
Kau
Kaur

So now if search for narinder or kaur by using the following query:

{
  "query": {
    "constant_score": {
      "query": {
        "match_phrase_prefix": {
          "primary_search_new": {
            "query": "narinder",
            "analyzer": "ys_search_analyzer_long"
          }
        }
      }
    }
  }
}

OR

{
  "query": {
    "constant_score": {
      "query": {
        "match_phrase_prefix": {
          "primary_search_new": {
            "query": "kaur",
            "analyzer": "ys_search_analyzer_long"
          }
        }
      }
    }
  }
}

both should have searched for the documents containing "Narinder Kaur". But currently I can not search for kaur. Its working only for first term match. The analyzer's used are as followed:


analysis: {
analyzer: {
ys_search_analyzer: {
type: custom
filter: [
ys_word_delimiter
trim
lowercase
]
tokenizer: ys_edge_ngram_tokenizer
}
ys_search_analyzer_long: {
type: custom
filter: [
ys_word_delimiter
trim
lowercase
]
tokenizer: ys_edge_ngram_tokenizer_long
}
}
filter: {
ys_word_delimiter: {
type: word_delimiter
stem_english_possessive: False
}
}
tokenizer: {
ys_edge_ngram_tokenizer_long: {
type: edgeNGram
min_gram: 1
max_gram: 60
}
ys_edge_ngram_tokenizer: {
min_gram: 1
type: edgeNGram
max_gram: 20
}
}
}



Please elaborate how its not working as expected? and what should I do to make my requirement work without re-indexing the data.

All help is appreciated.
thanks
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="5QpPDJgZL5gJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/76753cd7-7a47-4ca3-ba7b-90be025386b4%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/76753cd7-7a47-4ca3-ba7b-90be025386b4%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/76753cd7-7a47-4ca3-ba7b-90be025386b4%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;"> https://groups.google.com/d/msgid/elasticsearch/76753cd7-7a47-4ca3-ba7b-90be025386b4%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3ceb9a8f-9846-4779-81ed-8cfc4bb07847%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.