Highlighter problem

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Highlighter problem

paul
I am trying out Highlighter feature of elastic-search. the text marked in yellow is expected but why did it match the text marked in green ?

elastic-search = 0.90.0
java = 1.7

analyzer on that filed is "autocomplete" below is the configuration

"autocomplete":{
               "type":"custom",
               "tokenizer":"standard",
               "filter":[
                  "standard",
                  "lowercase",
                  "syns_filter",
                  "my_edgeNgram"
               ]
     },

My query:
{
  "fields": [
    "name"
  ],
  "query": {
    "match": {
      "name": "univ"
    }
  },
  "highlight": {
    "pre_tags": [
      "<tag1>"
    ],
    "post_tags": [
      "</tag1>"
    ],
    "fields": {
      "name": {}
    }
  }
}


Results:

{
   fields:{
      name:SUNY Binghamton University
   }   highlight:{
      name:[
         SUNY <tag1>Bing</tag1>hamton <tag1>Univ</tag1>ersity
      ]
   }
}

{
   fields:{
      name:Arizona State University
   }   highlight:{
      name:[
         Arizona State <tag1>Univ</tag1>ersity
      ]
   }
}

{
   fields:{
      name:Ohio State University
   }   highlight:{
      name:[
         Ohio <tag1>State</tag1> <tag1>University</tag1>
      ]
   }
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5d4a683f-a208-4a17-bce1-0248f43dbcd6%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Highlighter problem

Adrien Grand-2
Hi Paul,

Can you also paste your definitions of "syns_filter" and "my_edgeNgram"?


On Mon, Dec 16, 2013 at 7:28 AM, paul <[hidden email]> wrote:
I am trying out Highlighter feature of elastic-search. the text marked in yellow is expected but why did it match the text marked in green ?

elastic-search = 0.90.0
java = 1.7

analyzer on that filed is "autocomplete" below is the configuration

"autocomplete":{
               "type":"custom",
               "tokenizer":"standard",
               "filter":[
                  "standard",
                  "lowercase",
                  "syns_filter",
                  "my_edgeNgram"
               ]
     },

My query:
{
  "fields": [
    "name"
  ],
  "query": {
    "match": {
      "name": "univ"
    }
  },
  "highlight": {
    "pre_tags": [
      "<tag1>"
    ],
    "post_tags": [
      "</tag1>"
    ],
    "fields": {
      "name": {}
    }
  }
}


Results:

{
   fields:{
      name:SUNY Binghamton University
   }   highlight:{
      name:[
         SUNY <tag1>Bing</tag1>hamton <tag1>Univ</tag1>ersity
      ]
   }
}

{
   fields:{
      name:Arizona State University
   }   highlight:{
      name:[
         Arizona State <tag1>Univ</tag1>ersity
      ]
   }
}

{
   fields:{
      name:Ohio State University
   }   highlight:{
      name:[
         Ohio <tag1>State</tag1> <tag1>University</tag1>
      ]
   }
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5d4a683f-a208-4a17-bce1-0248f43dbcd6%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j48z-DWMmN3cRgNva93WZjxyRj1d-G%3D9fkTkd_Zn1Koaw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Highlighter problem

paul
Sure Adrien  below is my definitions , 

 "filter":{
            "syns_filter":{
               "synonyms_path":"synonyms/synonym_collegename.txt",
               "type":"synonym",
      "ignore_case":true
            },
            "my_edgeNgram":{
               "type":"edgeNGram",
               "min_gram":3,
               "max_gram":10
            }
         }
      }

Regards
Paul

On Monday, 16 December 2013 15:30:47 UTC+5:30, Adrien Grand wrote:
Hi Paul,

Can you also paste your definitions of "syns_filter" and "my_edgeNgram"?


On Mon, Dec 16, 2013 at 7:28 AM, paul <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="DFJhpeBM2kUJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">avinas...@...> wrote:
I am trying out Highlighter feature of elastic-search. the text marked in yellow is expected but why did it match the text marked in green ?

elastic-search = 0.90.0
java = 1.7

analyzer on that filed is "autocomplete" below is the configuration

"autocomplete":{
               "type":"custom",
               "tokenizer":"standard",
               "filter":[
                  "standard",
                  "lowercase",
                  "syns_filter",
                  "my_edgeNgram"
               ]
     },

My query:
{
  "fields": [
    "name"
  ],
  "query": {
    "match": {
      "name": "univ"
    }
  },
  "highlight": {
    "pre_tags": [
      "<tag1>"
    ],
    "post_tags": [
      "</tag1>"
    ],
    "fields": {
      "name": {}
    }
  }
}


Results:

{
   fields:{
      name:SUNY Binghamton University
   }   highlight:{
      name:[
         SUNY <tag1>Bing</tag1>hamton <tag1>Univ</tag1>ersity
      ]
   }
}

{
   fields:{
      name:Arizona State University
   }   highlight:{
      name:[
         Arizona State <tag1>Univ</tag1>ersity
      ]
   }
}

{
   fields:{
      name:Ohio State University
   }   highlight:{
      name:[
         Ohio <tag1>State</tag1> <tag1>University</tag1>
      ]
   }
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="DFJhpeBM2kUJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/5d4a683f-a208-4a17-bce1-0248f43dbcd6%40googlegroups.com" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/5d4a683f-a208-4a17-bce1-0248f43dbcd6%40googlegroups.com';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/5d4a683f-a208-4a17-bce1-0248f43dbcd6%40googlegroups.com';return true;">https://groups.google.com/d/msgid/elasticsearch/5d4a683f-a208-4a17-bce1-0248f43dbcd6%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/groups/opt_out" target="_blank" onmousedown="this.href='https://groups.google.com/groups/opt_out';return true;" onclick="this.href='https://groups.google.com/groups/opt_out';return true;">https://groups.google.com/groups/opt_out.



--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7ab57905-1533-4c4a-81e3-b370d0dced7e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Highlighter problem

Adrien Grand-2
I think the answer is in the content the content of the synonyms file. For example if there is an entry in this file that looks like "Binghamton, Binghamton University", in the end the analyzer is going to produce something like "b", "bi", ..., "bing", ..., "u", "un", ..., "univ", ... for a token whose term is "Binghamton". So if you search for "univ", it is actually going to highlight the "bing" of "Binghamton".

I don't think there is a simple solution to your problem. Since you seem to be using this index for auto-completion purposes, maybe a better option would be to not use synonyms in the analyzer but to add a separate document for every synonym.

On a side note, since you are doing auto-completion, maybe you could have a look at the completion suggester[1]. Although it doesn't support highlighting, I would expect it to be an order of magnitude faster than index-based autocompletion so this might be worth checking out.

On Tue, Dec 17, 2013 at 6:03 AM, paul <[hidden email]> wrote:
Sure Adrien  below is my definitions , 

 "filter":{
            "syns_filter":{
               "synonyms_path":"synonyms/synonym_collegename.txt",
               "type":"synonym",
      "ignore_case":true
            },
            "my_edgeNgram":{
               "type":"edgeNGram",
               "min_gram":3,
               "max_gram":10
            }
         }
      }

Regards
Paul

On Monday, 16 December 2013 15:30:47 UTC+5:30, Adrien Grand wrote:
Hi Paul,

Can you also paste your definitions of "syns_filter" and "my_edgeNgram"?


On Mon, Dec 16, 2013 at 7:28 AM, paul <[hidden email]> wrote:
I am trying out Highlighter feature of elastic-search. the text marked in yellow is expected but why did it match the text marked in green ?

elastic-search = 0.90.0
java = 1.7

analyzer on that filed is "autocomplete" below is the configuration

"autocomplete":{
               "type":"custom",
               "tokenizer":"standard",
               "filter":[
                  "standard",
                  "lowercase",
                  "syns_filter",
                  "my_edgeNgram"
               ]
     },

My query:
{
  "fields": [
    "name"
  ],
  "query": {
    "match": {
      "name": "univ"
    }
  },
  "highlight": {
    "pre_tags": [
      "<tag1>"
    ],
    "post_tags": [
      "</tag1>"
    ],
    "fields": {
      "name": {}
    }
  }
}


Results:

{
   fields:{
      name:SUNY Binghamton University
   }   highlight:{
      name:[
         SUNY <tag1>Bing</tag1>hamton <tag1>Univ</tag1>ersity
      ]
   }
}

{
   fields:{
      name:Arizona State University
   }   highlight:{
      name:[
         Arizona State <tag1>Univ</tag1>ersity
      ]
   }
}

{
   fields:{
      name:Ohio State University
   }   highlight:{
      name:[
         Ohio <tag1>State</tag1> <tag1>University</tag1>
      ]
   }
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.



--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7ab57905-1533-4c4a-81e3-b370d0dced7e%40googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.



--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7L3r%2BSYQV8YPJa0DRDShBR8cd8u687K0HD%3DYmtg981YQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Highlighter problem

paul
Thank you Adrien , will read search-suggesters-completion and see whether it suits my requirement.

Regards
Paul


On Tue, Dec 17, 2013 at 12:54 PM, Adrien Grand <[hidden email]> wrote:
I think the answer is in the content the content of the synonyms file. For example if there is an entry in this file that looks like "Binghamton, Binghamton University", in the end the analyzer is going to produce something like "b", "bi", ..., "bing", ..., "u", "un", ..., "univ", ... for a token whose term is "Binghamton". So if you search for "univ", it is actually going to highlight the "bing" of "Binghamton".

I don't think there is a simple solution to your problem. Since you seem to be using this index for auto-completion purposes, maybe a better option would be to not use synonyms in the analyzer but to add a separate document for every synonym.

On a side note, since you are doing auto-completion, maybe you could have a look at the completion suggester[1]. Although it doesn't support highlighting, I would expect it to be an order of magnitude faster than index-based autocompletion so this might be worth checking out.

On Tue, Dec 17, 2013 at 6:03 AM, paul <[hidden email]> wrote:
Sure Adrien  below is my definitions , 

 "filter":{
            "syns_filter":{
               "synonyms_path":"synonyms/synonym_collegename.txt",
               "type":"synonym",
      "ignore_case":true
            },
            "my_edgeNgram":{
               "type":"edgeNGram",
               "min_gram":3,
               "max_gram":10
            }
         }
      }

Regards
Paul

On Monday, 16 December 2013 15:30:47 UTC+5:30, Adrien Grand wrote:
Hi Paul,

Can you also paste your definitions of "syns_filter" and "my_edgeNgram"?


On Mon, Dec 16, 2013 at 7:28 AM, paul <[hidden email]> wrote:
I am trying out Highlighter feature of elastic-search. the text marked in yellow is expected but why did it match the text marked in green ?

elastic-search = 0.90.0
java = 1.7

analyzer on that filed is "autocomplete" below is the configuration

"autocomplete":{
               "type":"custom",
               "tokenizer":"standard",
               "filter":[
                  "standard",
                  "lowercase",
                  "syns_filter",
                  "my_edgeNgram"
               ]
     },

My query:
{
  "fields": [
    "name"
  ],
  "query": {
    "match": {
      "name": "univ"
    }
  },
  "highlight": {
    "pre_tags": [
      "<tag1>"
    ],
    "post_tags": [
      "</tag1>"
    ],
    "fields": {
      "name": {}
    }
  }
}


Results:

{
   fields:{
      name:SUNY Binghamton University
   }   highlight:{
      name:[
         SUNY <tag1>Bing</tag1>hamton <tag1>Univ</tag1>ersity
      ]
   }
}

{
   fields:{
      name:Arizona State University
   }   highlight:{
      name:[
         Arizona State <tag1>Univ</tag1>ersity
      ]
   }
}

{
   fields:{
      name:Ohio State University
   }   highlight:{
      name:[
         Ohio <tag1>State</tag1> <tag1>University</tag1>
      ]
   }
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.



--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].



--
Adrien Grand

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/_AGDR-z6glM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7L3r%2BSYQV8YPJa0DRDShBR8cd8u687K0HD%3DYmtg981YQ%40mail.gmail.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAO066G3LzSu_Avt4fg731Yunm-fqzhEhyUbYHPVDPj8NrUfYSw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.