Elastic search synonym match involving numeric characters

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Elastic search synonym match involving numeric characters

Siva Shanmuga Subramanian Murugan

I have documents indexed in elastic cluster with the below mapping. basically i have a field named model which holds car model names like "Silverado 2500HD", "Silverado 1500HD" "LX 350" etc etc.

POST /location-test-no-boost {
    "settings":{
        "analysis":{
            "analyzer":{
                "mysynonym":{
                    "tokenizer":"standard",
                    "filter":[
                        "standard","lowercase","stop","mysynonym"
                    ],
                    "ignore_case":true
                }
            },
            "filter":{
                "mysynonym":{
                    "type":"synonym",
                    "synonyms": [
                            "2500 HD=>2500HD",
                            "chevy silverado=>Silverado"
                        ]
                }
            }
        }
    },
    "mappings":{
        "vehicles":{
            "properties":{
                "id":{
                    "type":"long",
                    "ignore_malformed":true
                },
                "model":{
                    "type":"String",
                    "index_analyzer": "standard",
                    "search_analyzer":"mysynonym"
                }
            }
        }
    }
}

The sample document content is

POST /location-test-no-boost/vehicles/10
{
  "model" : "Silverado 2500HD"
}

When i tried to search with Query string "Chevy sivlerado", the synonym matches perfectly to Silverado and gives back the result, on the contrary when i tried to search via query string "2500 HD" it is returning 0 results. I tried different combination on the Synonym involving number and found that elastic search synonym mapper does not support numbers is this correct?

is there any way i can make some mapping when user searches for "2500 HD", i can map the query to "2500HD"

--
Please update your bookmarks! We have moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d6e2183b-fa43-491d-a441-9c442926f492%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Elastic search synonym match involving numeric characters

xinmeike
may be you should debug like this:

POST  /location-test-no-boost/vehicles?pretty&analyzer=mysynonym
"2500 HD"

and it will return the analyze result. You can compare to:

POST  /location-test-no-boost/vehicles?pretty&analyzer=standard
"Silverado 2500HD"

So you may know witch place has problem.


在 2015年5月28日星期四 UTC+8上午11:20:51,Siva Shanmuga Subramanian Murugan写道:

I have documents indexed in elastic cluster with the below mapping. basically i have a field named model which holds car model names like "Silverado 2500HD", "Silverado 1500HD" "LX 350" etc etc.

POST /location-test-no-boost {
    "settings":{
        "analysis":{
            "analyzer":{
                "mysynonym":{
                    "tokenizer":"standard",
                    "filter":[
                        "standard","lowercase","stop","mysynonym"
                    ],
                    "ignore_case":true
                }
            },
            "filter":{
                "mysynonym":{
                    "type":"synonym",
                    "synonyms": [
                            "2500 HD=>2500HD",
                            "chevy silverado=>Silverado"
                        ]
                }
            }
        }
    },
    "mappings":{
        "vehicles":{
            "properties":{
                "id":{
                    "type":"long",
                    "ignore_malformed":true
                },
                "model":{
                    "type":"String",
                    "index_analyzer": "standard",
                    "search_analyzer":"mysynonym"
                }
            }
        }
    }
}

The sample document content is

POST /location-test-no-boost/vehicles/10
{
  "model" : "Silverado 2500HD"
}

When i tried to search with Query string "Chevy sivlerado", the synonym matches perfectly to Silverado and gives back the result, on the contrary when i tried to search via query string "2500 HD" it is returning 0 results. I tried different combination on the Synonym involving number and found that elastic search synonym mapper does not support numbers is this correct?

is there any way i can make some mapping when user searches for "2500 HD", i can map the query to "2500HD"

--
Please update your bookmarks! We have moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4981bbf5-86a7-4777-959c-c0b264deb900%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Elastic search synonym match involving numeric characters

xinmeike
In reply to this post by Siva Shanmuga Subramanian Murugan
I think your filters' order may swap like this:

"analysis": {
      "analyzer": {
        "mysynonym": {
          "tokenizer": "standard",
          "filter": [
            "mysynonym","standard","lowercase", "stop"
          ],
          "ignore_case": true
        }
      }

That's because filters work like assembly line, if your first filter is standard, your "2500 HD" will split to "2500" and "HD" before go through other filters. So your mysynonym do not work.

//before swap filters' order
$ curl
-POST "http://myES/my_test/_analyze?pretty&analyzer=mysynonym" -d "2500 HD"
{
 
"tokens" : [ {
   
"token" : "2500",
   
"start_offset" : 0,
   
"end_offset" : 4,
   
"type" : "<NUM>",
   
"position" : 1
 
}, {
   
"token" : "hd",
   
"start_offset" : 5,
   
"end_offset" : 7,
   
"type" : "<ALPHANUM>",
   
"position" : 2
 
} ]
}
//after
$ curl
-POST "http://myES/my_test/_analyze?pretty&analyzer=mysynonym" -d "2500 HD"
{
 
"tokens" : [ {
   
"token" : "2500hd",
   
"start_offset" : 0,
   
"end_offset" : 7,
   
"type" : "SYNONYM",
   
"position" : 1
 
} ]
}


Sincerely hope this may helpful to you.


在 2015年5月28日星期四 UTC+8上午11:20:51,Siva Shanmuga Subramanian Murugan写道:

I have documents indexed in elastic cluster with the below mapping. basically i have a field named model which holds car model names like "Silverado 2500HD", "Silverado 1500HD" "LX 350" etc etc.

POST /location-test-no-boost {
    "settings":{
        "analysis":{
            "analyzer":{
                "mysynonym":{
                    "tokenizer":"standard",
                    "filter":[
                        "standard","lowercase","stop","mysynonym"
                    ],
                    "ignore_case":true
                }
            },
            "filter":{
                "mysynonym":{
                    "type":"synonym",
                    "synonyms": [
                            "2500 HD=>2500HD",
                            "chevy silverado=>Silverado"
                        ]
                }
            }
        }
    },
    "mappings":{
        "vehicles":{
            "properties":{
                "id":{
                    "type":"long",
                    "ignore_malformed":true
                },
                "model":{
                    "type":"String",
                    "index_analyzer": "standard",
                    "search_analyzer":"mysynonym"
                }
            }
        }
    }
}

The sample document content is

POST /location-test-no-boost/vehicles/10
{
  "model" : "Silverado 2500HD"
}

When i tried to search with Query string "Chevy sivlerado", the synonym matches perfectly to Silverado and gives back the result, on the contrary when i tried to search via query string "2500 HD" it is returning 0 results. I tried different combination on the Synonym involving number and found that elastic search synonym mapper does not support numbers is this correct?

is there any way i can make some mapping when user searches for "2500 HD", i can map the query to "2500HD"

--
Please update your bookmarks! We have moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/803ed44d-2c8f-43fd-9cab-470e9a262b00%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Elastic search synonym match involving numeric characters

Siva Shanmuga Subramanian Murugan
Woow this fixed the issue. Thanks a lot for your help

Regards
Siva

On Thursday, May 28, 2015 at 1:00:04 AM UTC-7, [hidden email] wrote:
I think your filters' order may swap like this:

"analysis": {
      "analyzer": {
        "mysynonym": {
          "tokenizer": "standard",
          "filter": [
            "mysynonym","standard","lowercase", "stop"
          ],
          "ignore_case": true
        }
      }

That's because filters work like assembly line, if your first filter is standard, your "2500 HD" will split to "2500" and "HD" before go through other filters. So your mysynonym do not work.

//before swap filters' order
$ curl
-POST "<a href="http://myES/my_test/_analyze?pretty&amp;analyzer=mysynonym" target="_blank" rel="nofollow" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2FmyES%2Fmy_test%2F_analyze%3Fpretty%26analyzer%3Dmysynonym\46sa\75D\46sntz\0751\46usg\75AFQjCNHQ81awfWlgl3Pqd2b4aa86JO208w';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2FmyES%2Fmy_test%2F_analyze%3Fpretty%26analyzer%3Dmysynonym\46sa\75D\46sntz\0751\46usg\75AFQjCNHQ81awfWlgl3Pqd2b4aa86JO208w';return true;">http://myES/my_test/_analyze?pretty&analyzer=mysynonym" -d "2500 HD"
{
 
"tokens" : [ {
   
"token" : "2500",
   
"start_offset" : 0,
   
"end_offset" : 4,
   
"type" : "<NUM>",
   
"position" : 1
 
}, {
   
"token" : "hd",
   
"start_offset" : 5,
   
"end_offset" : 7,
   
"type" : "<ALPHANUM>",
   
"position" : 2
 
} ]
}
//after
$ curl
-POST "<a href="http://myES/my_test/_analyze?pretty&amp;analyzer=mysynonym" target="_blank" rel="nofollow" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2FmyES%2Fmy_test%2F_analyze%3Fpretty%26analyzer%3Dmysynonym\46sa\75D\46sntz\0751\46usg\75AFQjCNHQ81awfWlgl3Pqd2b4aa86JO208w';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2FmyES%2Fmy_test%2F_analyze%3Fpretty%26analyzer%3Dmysynonym\46sa\75D\46sntz\0751\46usg\75AFQjCNHQ81awfWlgl3Pqd2b4aa86JO208w';return true;">http://myES/my_test/_analyze?pretty&analyzer=mysynonym" -d "2500 HD"
{
 
"tokens" : [ {
   
"token" : "2500hd",
   
"start_offset" : 0,
   
"end_offset" : 7,
   
"type" : "SYNONYM",
   
"position" : 1
 
} ]
}


Sincerely hope this may helpful to you.


在 2015年5月28日星期四 UTC+8上午11:20:51,Siva Shanmuga Subramanian Murugan写道:

I have documents indexed in elastic cluster with the below mapping. basically i have a field named model which holds car model names like "Silverado 2500HD", "Silverado 1500HD" "LX 350" etc etc.

POST /location-test-no-boost {
    "settings":{
        "analysis":{
            "analyzer":{
                "mysynonym":{
                    "tokenizer":"standard",
                    "filter":[
                        "standard","lowercase","stop","mysynonym"
                    ],
                    "ignore_case":true
                }
            },
            "filter":{
                "mysynonym":{
                    "type":"synonym",
                    "synonyms": [
                            "2500 HD=>2500HD",
                            "chevy silverado=>Silverado"
                        ]
                }
            }
        }
    },
    "mappings":{
        "vehicles":{
            "properties":{
                "id":{
                    "type":"long",
                    "ignore_malformed":true
                },
                "model":{
                    "type":"String",
                    "index_analyzer": "standard",
                    "search_analyzer":"mysynonym"
                }
            }
        }
    }
}

The sample document content is

POST /location-test-no-boost/vehicles/10
{
  "model" : "Silverado 2500HD"
}

When i tried to search with Query string "Chevy sivlerado", the synonym matches perfectly to Silverado and gives back the result, on the contrary when i tried to search via query string "2500 HD" it is returning 0 results. I tried different combination on the Synonym involving number and found that elastic search synonym mapper does not support numbers is this correct?

is there any way i can make some mapping when user searches for "2500 HD", i can map the query to "2500HD"

--
Please update your bookmarks! We have moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/792b252c-7a46-49fd-9266-5b811d00b103%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.