Sort Chinese error

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Sort Chinese error

yin weifeng
I use such code:
"SearchResponse searchResponse = client.prepareSearch("My_db")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
                                       .setQuery(queryBuilder)
                                       .addSort(“xxx”,SortOrder.DESC)..."
hope to sort by field "xxx" ,but when the value of field "xxx" is
Chinese , or '#','%'...,Es throw an error message:Query Failed [Failed
to execute main query],what should I do.
Reply | Threaded
Open this post in threaded view
|

Re: Sort Chinese error

Ivan Brusic
What is the exact error? One possible issue is if field "xxx" is
analyzed. You can only sort on non-analyzed fields.

On Wed, Feb 1, 2012 at 9:42 PM, 伟峰 殷 <[hidden email]> wrote:
> I use such code:
> "SearchResponse searchResponse = client.prepareSearch("My_db")
> .setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
>                                       .setQuery(queryBuilder)
>                                       .addSort(“xxx”,SortOrder.DESC)..."
> hope to sort by field "xxx" ,but when the value of field "xxx" is
> Chinese , or '#','%'...,Es throw an error message:Query Failed [Failed
> to execute main query],what should I do.
Reply | Threaded
Open this post in threaded view
|

Re: Sort Chinese error

yin weifeng
Thanks Ivan!

I want to do term search and also sorting on the same field, should I make two different index fields for the same contents, or some other way?


Reply | Threaded
Open this post in threaded view
|

Re: Sort Chinese error

yin weifeng
This post was updated on .
In reply to this post by Ivan Brusic
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Sort Chinese error

Jan Fiedler
For correct, locale specific sorting you should create a separate field for sorting purposes. This is best done via the multi-field mapping ( http://www.elasticsearch.org/guide/reference/mapping/multi-field-type.html).

The sort field should use a special (sort) analyzer that performs collation for Chinese. In simple terms, a collator takes your term and calculates a sorting key (that does not resemble the term). Take a look at the ICU plugin and its collators (http://www.elasticsearch.org/guide/reference/index-modules/analysis/icu-plugin.html)
Reply | Threaded
Open this post in threaded view
|

Re: Sort Chinese error

yin weifeng
hi, Jan

Thank you for your help!
I use this
mapping,now it can sorting and search at on the field.
...
"fields" : {
     "fieldName" : {"type" : "string", "index" : "analyzed"},
     "sort
FieldName" : {"type" : "string", "index" : "not_analyzed"} 
}
...

But for the Chinese, sorting as "not_analyzed" seams no significance
How to define the special sort
analyzer , for example in phonetic.


2012/2/3 Jan Fiedler <[hidden email]>
For correct, locale specific sorting you should create a separate field for sorting purposes. This is best done via the multi-field mapping ( http://www.elasticsearch.org/guide/reference/mapping/multi-field-type.html).

The sort field should use a special (sort) analyzer that performs collation for Chinese. In simple terms, a collator takes your term and calculates a sorting key (that does not resemble the term). Take a look at the ICU plugin and its collators (http://www.elasticsearch.org/guide/reference/index-modules/analysis/icu-plugin.html)

Reply | Threaded
Open this post in threaded view
|

Re: Sort Chinese error

yin weifeng
In reply to this post by Jan Fiedler
hi, Jan

Thank you for your help!
I use this
mapping,now it can sorting and search at on the field.
...
"fields" : {
     "fieldName" : {"type" : "string", "index" : "analyzed"},
     "sort
FieldName" : {"type" : "string", "index" : "not_analyzed"} 
}
...

But for the Chinese, sorting as "not_analyzed" seams no significance
How to define the special sort
analyzer , for example in phonetic.


2012/2/3 Jan Fiedler <[hidden email]>
For correct, locale specific sorting you should create a separate field for sorting purposes. This is best done via the multi-field mapping ( http://www.elasticsearch.org/guide/reference/mapping/multi-field-type.html).

The sort field should use a special (sort) analyzer that performs collation for Chinese. In simple terms, a collator takes your term and calculates a sorting key (that does not resemble the term). Take a look at the ICU plugin and its collators (http://www.elasticsearch.org/guide/reference/index-modules/analysis/icu-plugin.html)

Reply | Threaded
Open this post in threaded view
|

Re: Sort Chinese error

Jan Fiedler
In reply to this post by yin weifeng
You are close but not there yet. To get Chinese sorting right you need the following 3 additional steps:

1. Configure a sort analyzer for your sort field

...
"fields" : {
     "fieldName" : {"type" : "string", "index" : "analyzed"},
     "sort
FieldName" : {"type" : "string", "index" : "analyzed", "analyzer" : "my_chinese_sort"}  
}
...
 

2. Configure the sort analyzer (e.g. in elasticsearch.yml)

index:
analysis:
analyzer: 
my_chinese_sort :
type : custom
tokenizer : keyword
filter : [icu_collation_chinese]

filter:
icu_collation_chinese:
type: icu_collation
language : ch


I am not sure about the actual language identifier to be used for Chinese. I trust that ICU supports Chinese (I did not try it).

3. Install the ICU plugin

Run the following from your ES home:

bin/plugin -install elasticsearch/elasticsearch-analysis-icu/1.1.0

Reply | Threaded
Open this post in threaded view
|

Re: Sort Chinese error

yin weifeng
Thank you!

You are right, the configuration can really make the field order by Chinese pinyin, I think that ICU do something with it.

But the configuration in elasticsearch.yml has no effect,I use

curl -XPOST localhost:9200/backlog_db -d '{
    "settings":{
        "index" : {
            "analysis" : {
                "analyzer" : {
                    "my_chinese_sort" : {
                        "type" : "custom",
                        "tokenizer" : "keyword",
                        "filter" : ["icu_collation_chinese"]
                    }
                },
                "filter" : {
                    "icu_collation_chinese:" : {
                        "type" : "icu_collation",
                        "language" : "ch"
                    }
                }
            }
        }
    }
}'

to configure the sort analyzer.
Reply | Threaded
Open this post in threaded view
|

Re: Sort Chinese error

Jan Fiedler
I recommend using the analyze API (via curl interactively) to test whether your analyzer settings made it correctly into your index. Find information on the analyzer API usage here:  http://www.elasticsearch.org/guide/reference/api/admin-indices-analyze.html
Reply | Threaded
Open this post in threaded view
|

Re: Sort Chinese error

yin weifeng
Now it works well on win7 system, but when I use it in Linux environment (Ubuntu 11.10 and CentOS5.6), the sorting result are different even the configuration is same.

What could be the reason?

thanks!




2012/2/6 Jan Fiedler <[hidden email]>
I recommend using the analyze API (via curl interactively) to test whether your analyzer settings made it correctly into your index. Find information on the analyzer API usage here:  http://www.elasticsearch.org/guide/reference/api/admin-indices-analyze.html