Can we use Chinese character in wildcard query.

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Can we use Chinese character in wildcard query.

mohit.Kumar
Hi folks,
I am trying to get data from a Chinese keyword, but it always showing  Zero hits.

Elasticsearch query :

{"query":{"wildcard" : { "text" : " 好不* " }}}

I am using java program to fire this query. I have tried UTF-8 conversion to get data but failed to get any data.

thanks in advance.
 

Regrads
Mohit

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Can we use Chinese character in wildcard query.

InquiringMind
The prefix query (snippet below) works for me. For example:

{"prefix" : {"words" : "醫"}}

I haven't tried a wildcard query in Java, since is is rather like a very slow grep and not generally useful. Ending wildcards are the same as prefix queries (logically) but are typically rather fast in my experience.

I hope this helps!

Brian

On Monday, October 21, 2013 9:04:48 AM UTC-4, Mohit Kumar Yadav wrote:
Hi folks,
I am trying to get data from a Chinese keyword, but it always showing  Zero hits.

Elasticsearch query :

{"query":{"wildcard" : { "text" : " 好不* " }}}

I am using java program to fire this query. I have tried UTF-8 conversion to get data but failed to get any data.

thanks in advance.
 

Regrads
Mohit

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Can we use Chinese character in wildcard query.

InquiringMind
In reply to this post by mohit.Kumar
By the way, when I first tried to create a working example using my local test/dev index, my Chinese characters were missing and queries against them did not work. I don't exactly recall the last time I deleted and reloaded that index, nor do I remember exactly which ES versions were changed. But I am currently running on ES 0.90.3, and I believe the index was deleted and recreated (with successful regression tests including Chinese characters) no earlier than 0.90.0. So I don't have any logs to show; just results. But here are the results:

In general, this is against a synonym "table". (Yeah, I know. But I do find that a separate query for synonyms means that changing synonyms does not require a reload or reindex of the data. And performance is very good.)

{
  "bool" : {
    "must" : [ {
      "match" : {
        "field" : {
          "query" : "gn",
          "type" : "boolean"
        }
      }
    }, {
      "prefix" : {
        "words" : "醫"
      }
    } ]
  }
}


1. When I first used my current laptop set-up to get a working example, nothing was found. When I queried one of the English terms, the following result came back. Note that the last value is expected to be a Chinese phrase but comes out null instead:

{ "field" : [ "gn" , "o" , "cnam" ] , "words" : [ "Dr" , "Doctor" , "MD" , "Phd" , null ] }

2. After deleting and reloading the index, the query now returns all words including the Chinese:

{ "field" : [ "gn" , "o" , "cnam" ] , "words" : [ "Dr" , "Doctor" , "MD" , "Phd" , "醫生" ] }

Not sure why, since this has always worked starting with my initial ES version 19.4 and hasn't yet (until today) failed.

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.