Sorting strings that contain numbers

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Sorting strings that contain numbers

Nick Hoffman
Hi guys. When sorting on a field that's a string, strings that contain numbers aren't sorted properly.

For example, with these documents:
{ name: "Bob: 3 points" }
{ name: "Bob: 10 points" }
{ name: "Bob: 2 points" }

When ES sorts on the "name" field, the documents are returned in this order:
{ name: "Bob: 10 points" }
{ name: "Bob: 2 points" }
{ name: "Bob: 3 points" }

How can we get ES to return the documents in the following order?
{ name: "Bob: 2 points" }
{ name: "Bob: 3 points" }
{ name: "Bob: 10 points" }

Thanks,
Nick

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Sorting strings that contain numbers

dadoonet
IMHO you should index docs with 02 instead of 2.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 28 oct. 2012 à 17:48, Nick Hoffman <[hidden email]> a écrit :

Hi guys. When sorting on a field that's a string, strings that contain numbers aren't sorted properly.

For example, with these documents:
{ name: "Bob: 3 points" }
{ name: "Bob: 10 points" }
{ name: "Bob: 2 points" }

When ES sorts on the "name" field, the documents are returned in this order:
{ name: "Bob: 10 points" }
{ name: "Bob: 2 points" }
{ name: "Bob: 3 points" }

How can we get ES to return the documents in the following order?
{ name: "Bob: 2 points" }
{ name: "Bob: 3 points" }
{ name: "Bob: 10 points" }

Thanks,
Nick

--
 
 

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Sorting strings that contain numbers

Nick Hoffman
Hi David. Prefixing numbers with zeros won't work because that assumes that there's a constant number of digits in the number.


On Sunday, 28 October 2012 15:01:37 UTC-4, David Pilato wrote:
IMHO you should index docs with 02 instead of 2.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 28 oct. 2012 à 17:48, Nick Hoffman <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="PxQbv1lJEpkJ">ni...@...> a écrit :

Hi guys. When sorting on a field that's a string, strings that contain numbers aren't sorted properly.

For example, with these documents:
{ name: "Bob: 3 points" }
{ name: "Bob: 10 points" }
{ name: "Bob: 2 points" }

When ES sorts on the "name" field, the documents are returned in this order:
{ name: "Bob: 10 points" }
{ name: "Bob: 2 points" }
{ name: "Bob: 3 points" }

How can we get ES to return the documents in the following order?
{ name: "Bob: 2 points" }
{ name: "Bob: 3 points" }
{ name: "Bob: 10 points" }

Thanks,
Nick

--
 
 

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Sorting strings that contain numbers

Axsuul
Related
http://elasticsearch-users.115913.n3.nabble.com/Sorting-a-string-field-numerically-td4024557.html#a4024561

On Sunday, October 28, 2012 12:10:52 PM UTC-7, Nick Hoffman wrote:
Hi David. Prefixing numbers with zeros won't work because that assumes that there's a constant number of digits in the number.


On Sunday, 28 October 2012 15:01:37 UTC-4, David Pilato wrote:
IMHO you should index docs with 02 instead of 2.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 28 oct. 2012 à 17:48, Nick Hoffman <[hidden email]> a écrit :

Hi guys. When sorting on a field that's a string, strings that contain numbers aren't sorted properly.

For example, with these documents:
{ name: "Bob: 3 points" }
{ name: "Bob: 10 points" }
{ name: "Bob: 2 points" }

When ES sorts on the "name" field, the documents are returned in this order:
{ name: "Bob: 10 points" }
{ name: "Bob: 2 points" }
{ name: "Bob: 3 points" }

How can we get ES to return the documents in the following order?
{ name: "Bob: 2 points" }
{ name: "Bob: 3 points" }
{ name: "Bob: 10 points" }

Thanks,
Nick

--
 
 

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Sorting strings that contain numbers

Chris Male
In reply to this post by Nick Hoffman
Could you split the data into multiple fields? So have a name field "Bob, Anne, etc" which is a string and a points field "3, 10, 2" which is a number.  Then sort both fields together, name coming first?

On Monday, October 29, 2012 5:48:12 AM UTC+13, Nick Hoffman wrote:
Hi guys. When sorting on a field that's a string, strings that contain numbers aren't sorted properly.

For example, with these documents:
{ name: "Bob: 3 points" }
{ name: "Bob: 10 points" }
{ name: "Bob: 2 points" }

When ES sorts on the "name" field, the documents are returned in this order:
{ name: "Bob: 10 points" }
{ name: "Bob: 2 points" }
{ name: "Bob: 3 points" }

How can we get ES to return the documents in the following order?
{ name: "Bob: 2 points" }
{ name: "Bob: 3 points" }
{ name: "Bob: 10 points" }

Thanks,
Nick

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Sorting strings that contain numbers

Nick Hoffman
I'd definitely do that if I could, Chris. The strings that I'm indexing are names of objects that can't be split, unfortunately. E.g.

Megatron (UN-04)
The Amazing Spider-Man #44
G2 Optimus Prime

This is why the sorting has to happen within ES.


On Sunday, 28 October 2012 22:40:02 UTC-4, Chris Male wrote:
Could you split the data into multiple fields? So have a name field "Bob, Anne, etc" which is a string and a points field "3, 10, 2" which is a number.  Then sort both fields together, name coming first?

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Sorting strings that contain numbers

joergprante@gmail.com
Hi Nick,

thanks for your inspiration. It's a great idea. I just hacked together a plugin that can perform the desired sort by using a natural sort key in a Lucene token filter.

https://github.com/jprante/elasticsearch-analysis-naturalsort

The README is a little short but I hope it helps.

See also the test file https://github.com/jprante/elasticsearch-analysis-naturalsort/blob/master/src/test/java/org/elasticsearch/index/analysis/naturalsort/NaturalSortKeyTests.java

As a bonus, a collator key sort is included (since the natural sort key extends the collator key, you have to add a "locale" parameter to the token filter if you want locale-sensitive sort)

Cheers,

Jörg

On Monday, October 29, 2012 3:54:01 AM UTC+1, Nick Hoffman wrote:
I'd definitely do that if I could, Chris. The strings that I'm indexing are names of objects that can't be split, unfortunately. E.g.

Megatron (UN-04)
The Amazing Spider-Man #44
G2 Optimus Prime

This is why the sorting has to happen within ES.


On Sunday, 28 October 2012 22:40:02 UTC-4, Chris Male wrote:
Could you split the data into multiple fields? So have a name field "Bob, Anne, etc" which is a string and a points field "3, 10, 2" which is a number.  Then sort both fields together, name coming first?

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Sorting strings that contain numbers

bbehling
Jörg,

Will your plugin work with ES version 1.3.4?
Reply | Threaded
Open this post in threaded view
|

Re: Sorting strings that contain numbers

joergprante@gmail.com
Yes, there is a version for ES 1.3.4

Jörg

On Fri, Jan 23, 2015 at 7:13 PM, bbehling <[hidden email]> wrote:
Jörg,

Will your plugin work with ES version 1.3.4?



--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Sorting-strings-that-contain-numbers-tp4024553p4069501.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1422036835206-4069501.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHgRyFRgTMnv61ukyWNfEw4hKX3UHHFdEg7n0upqKLJBQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.