Sorting a string field numerically

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Sorting a string field numerically

Axsuul
I have a field that is string type. Sometimes it will contain integer values such as 1, 2, 10, 20. Currently how it sorts those is 1, 10, 2, 20. How do I go aboutsorting that numerically so that it's 1, 2, 10, 20?

--
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sorting a string field numerically

Nick Hoffman
Hah, I asked the exact same question:


On Sunday, 28 October 2012 14:45:25 UTC-4, James Hu wrote:
I have a field that is string type. Sometimes it will contain integer values such as 1, 2, 10, 20. Currently how it sorts those is 1, 10, 2, 20. How do I go aboutsorting that numerically so that it's 1, 2, 10, 20?

--
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sorting a string field numerically

Axsuul
Awesome, thanks for pointing that out! I think leading zeros is a great solution. I'm now using 10 character length with leading zeros and it works well. Maybe you should consider using something conservative like 15 so that it will never surpass that.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sorting a string field numerically

Radu Gheorghe-2
In reply to this post by Nick Hoffman
Hello,

You can use the _source field, if you got it stored, in a script
within a custom_score query. For example, assuming "foo" is the name
of our field:

curl -XPOST localhost:9200/test/test/_search?pretty=true -d '{
    "query": {
        "custom_score": {
            "query": {
                "match_all": {}
            },
            "script": "if (_source.foo is Integer) { _source.foo }
else { _score/5 }"
        }
    }
}'

This will get your numeric values sorted as numbers. You'd need to
fill out the "else" part of the script with whatever values you want
to give to your non-numeric fields.

Best regards,
Radu
--
http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Sun, Oct 28, 2012 at 9:10 PM, Nick Hoffman <[hidden email]> wrote:

> Hah, I asked the exact same question:
> https://groups.google.com/forum/#!topic/elasticsearch/gds3sZKA-SI
>
>
> On Sunday, 28 October 2012 14:45:25 UTC-4, James Hu wrote:
>>
>> I have a field that is string type. Sometimes it will contain integer
>> values such as 1, 2, 10, 20. Currently how it sorts those is 1, 10, 2, 20.
>> How do I go aboutsorting that numerically so that it's 1, 2, 10, 20?
>
> --
>
>

--


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sorting a string field numerically

Nick Hoffman
Hi Radu. That's an interesting idea. How would you score fields that aren't numeric, though?


On Monday, 29 October 2012 05:49:37 UTC-4, Radu Gheorghe wrote:
Hello,

You can use the _source field, if you got it stored, in a script
within a custom_score query. For example, assuming "foo" is the name
of our field:

curl -XPOST localhost:9200/test/test/_search?pretty=true -d '{
    "query": {
        "custom_score": {
            "query": {
                "match_all": {}
            },
            "script": "if (_source.foo is Integer) { _source.foo }
else { _score/5 }"
        }
    }
}'

This will get your numeric values sorted as numbers. You'd need to
fill out the "else" part of the script with whatever values you want
to give to your non-numeric fields.

Best regards,
Radu
--
http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Sun, Oct 28, 2012 at 9:10 PM, Nick Hoffman <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="LlbGMsTz6BYJ">ni...@...> wrote:

> Hah, I asked the exact same question:
> https://groups.google.com/forum/#!topic/elasticsearch/gds3sZKA-SI
>
>
> On Sunday, 28 October 2012 14:45:25 UTC-4, James Hu wrote:
>>
>> I have a field that is string type. Sometimes it will contain integer
>> values such as 1, 2, 10, 20. Currently how it sorts those is 1, 10, 2, 20.
>> How do I go aboutsorting that numerically so that it's 1, 2, 10, 20?
>
> --
>
>

--
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sorting a string field numerically

Igor Motov-3
I think the idea that Radu outlined might work. But I would propose a few changes to the implementation. First of all, retrieving and parsing source for every single result can be very expensive especially if records have large sources. So, it might be better to replace _source lookup with doc or _field lookup. 

The expression "_source.foo is Integer" would work only if "foo" was an integer in the source. In other words, if it was indexed as {"foo": 123} it would work, but if it was indexed as {"foo": "123"}  it wouldn't. In order to handle the latter case, we need to actually parse the string. 

To answer, Nick's question, we can simplify handling of non-numeric cases, by switching from custom_score query to sort script . 

So, if we combine all these suggestions, we will get something like this:

{
  "query" : {
      "match_all": {}
  },
  "sort": {
      "_script":{
          "script" : "s = doc['\''foo'\''].value; n = org.elasticsearch.common.primitives.Ints.tryParse(s); if (n != null) { String.format(\"%010d\",n)} else { s }",
          "type" : "string"
      }
  }
}'

That still might be too slow for the large result list. So, an ideal solution here would be to pad integer values with 0 so all integers have the same size during indexing. In other words, do the same thing that the script above is doing but during indexing. 

Here is a complete example if somebody wants to play with it: https://gist.github.com/3973641


On Monday, October 29, 2012 8:59:26 AM UTC-4, Nick Hoffman wrote:
Hi Radu. That's an interesting idea. How would you score fields that aren't numeric, though?


On Monday, 29 October 2012 05:49:37 UTC-4, Radu Gheorghe wrote:
Hello,

You can use the _source field, if you got it stored, in a script
within a custom_score query. For example, assuming "foo" is the name
of our field:

curl -XPOST localhost:9200/test/test/_search?pretty=true -d '{
    "query": {
        "custom_score": {
            "query": {
                "match_all": {}
            },
            "script": "if (_source.foo is Integer) { _source.foo }
else { _score/5 }"
        }
    }
}'

This will get your numeric values sorted as numbers. You'd need to
fill out the "else" part of the script with whatever values you want
to give to your non-numeric fields.

Best regards,
Radu
--
http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Sun, Oct 28, 2012 at 9:10 PM, Nick Hoffman <[hidden email]> wrote:

> Hah, I asked the exact same question:
> https://groups.google.com/forum/#!topic/elasticsearch/gds3sZKA-SI
>
>
> On Sunday, 28 October 2012 14:45:25 UTC-4, James Hu wrote:
>>
>> I have a field that is string type. Sometimes it will contain integer
>> values such as 1, 2, 10, 20. Currently how it sorts those is 1, 10, 2, 20.
>> How do I go aboutsorting that numerically so that it's 1, 2, 10, 20?
>
> --
>
>

--
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Sorting a string field numerically

rpsandiford
This post has NOT been accepted by the mailing list yet.

Another index-time possibility is to use a magnitude notation prefix for all the numeric portions.

 

So – if the number is 1 digit long, prefix it with a ‘1’, if it is two digits, prefix with a ‘2’, and so on.  If you think you’ll exceed 10 digits anywhere, then use a two-digit prefix (i.e. 01 and 02 respectively for a 1 digit and a 2 digit number).  That gets around having long tokens (e.g. a whole bunch of leading zeroes) which might be an issue, and also allows you some more flexibility on how long an integer you might encounter without having to re-index everything to put more leading zeroes on the front if you exceed your initial ‘max length’ number.  (i.e. a 2 digit mag notation prefix will handle up to a 99 digit number)

 

So, 1, 2, 10, 20 would be indexed as 011, 012, 0210, 0220, which will sort ‘alphabetically’ (i.e. LTR sort) in the correct numeric sequence.

 

Bob Sandiford | Principal Engineer SirsiDynix

P: 800.288.8020 X6943 | [hidden email]

www.sirsidynix.com

 

Join the conversation: Like us on Facebook! Follow us on Twitter!

 

From: Igor Motov-3 [via ElasticSearch Users] [mailto:ml-node+[hidden email]]
Sent: Monday, October 29, 2012 9:58 AM
To: Bob Sandiford
Subject: Re: Sorting a string field numerically

 

I think the idea that Radu outlined might work. But I would propose a few changes to the implementation. First of all, retrieving and parsing source for every single result can be very expensive especially if records have large sources. So, it might be better to replace _source lookup with doc or _field lookup. 

 

The expression "_source.foo is Integer" would work only if "foo" was an integer in the source. In other words, if it was indexed as {"foo": 123} it would work, but if it was indexed as {"foo": "123"}  it wouldn't. In order to handle the latter case, we need to actually parse the string. 

 

To answer, Nick's question, we can simplify handling of non-numeric cases, by switching from custom_score query to sort script . 

 

So, if we combine all these suggestions, we will get something like this:

 

{

  "query" : {

      "match_all": {}

  },

  "sort": {

      "_script":{

          "script" : "s = doc['\''foo'\''].value; n = org.elasticsearch.common.primitives.Ints.tryParse(s); if (n != null) { String.format(\"%010d\",n)} else { s }",

          "type" : "string"

      }

  }

}'

 

That still might be too slow for the large result list. So, an ideal solution here would be to pad integer values with 0 so all integers have the same size during indexing. In other words, do the same thing that the script above is doing but during indexing. 

 

Here is a complete example if somebody wants to play with it: https://gist.github.com/3973641

 


On Monday, October 29, 2012 8:59:26 AM UTC-4, Nick Hoffman wrote:

Hi Radu. That's an interesting idea. How would you score fields that aren't numeric, though?



On Monday, 29 October 2012 05:49:37 UTC-4, Radu Gheorghe wrote:

Hello,

You can use the _source field, if you got it stored, in a script
within a custom_score query. For example, assuming "foo" is the name
of our field:

curl -XPOST localhost:9200/test/test/_search?pretty=true -d '{
    "query": {
        "custom_score": {
            "query": {
                "match_all": {}
            },
            "script": "if (_source.foo is Integer) { _source.foo }
else { _score/5 }"
        }
    }
}'

This will get your numeric values sorted as numbers. You'd need to
fill out the "else" part of the script with whatever values you want
to give to your non-numeric fields.

Best regards,
Radu
--
http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Sun, Oct 28, 2012 at 9:10 PM, Nick Hoffman <[hidden email]> wrote:


> Hah, I asked the exact same question:
> https://groups.google.com/forum/#!topic/elasticsearch/gds3sZKA-SI
>
>
> On Sunday, 28 October 2012 14:45:25 UTC-4, James Hu wrote:
>>
>> I have a field that is string type. Sometimes it will contain integer
>> values such as 1, 2, 10, 20. Currently how it sorts those is 1, 10, 2, 20.
>> How do I go aboutsorting that numerically so that it's 1, 2, 10, 20?
>
> --
>
>

--
 
 


If you reply to this email, your message will be added to the discussion below:

http://elasticsearch-users.115913.n3.nabble.com/Sorting-a-string-field-numerically-tp4024557p4024574.html

To start a new topic under ElasticSearch Users, email [hidden email]
To unsubscribe from ElasticSearch Users, click here.
NAML

Bob.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Sorting a string field numerically

rpsandiford
This post has NOT been accepted by the mailing list yet.
In reply to this post by Igor Motov-3

P.S. – you may also want to consider decimals.  e.g. if “12.5” should sort before “12.12” (for example), then you’ll want to include a magnitude notation following a decimal (i.e. you’d have 312.15 and 312.212).  However, if “12.5” should sort after “12.12”, then you’ll want to avoid placing a magnitude notation prefix on digits following the decimal point (i.e. you’d have 312.5 and 312.12…)

 

Bob Sandiford | Principal Engineer SirsiDynix

P: 800.288.8020 X6943 | [hidden email]

www.sirsidynix.com

 

Join the conversation: Like us on Facebook! Follow us on Twitter!

 

From: Bob Sandiford
Sent: Monday, October 29, 2012 10:05 AM
To: 'Igor Motov-3 [via ElasticSearch Users]'
Subject: RE: Sorting a string field numerically

 

Another index-time possibility is to use a magnitude notation prefix for all the numeric portions.

 

So – if the number is 1 digit long, prefix it with a ‘1’, if it is two digits, prefix with a ‘2’, and so on.  If you think you’ll exceed 10 digits anywhere, then use a two-digit prefix (i.e. 01 and 02 respectively for a 1 digit and a 2 digit number).  That gets around having long tokens (e.g. a whole bunch of leading zeroes) which might be an issue, and also allows you some more flexibility on how long an integer you might encounter without having to re-index everything to put more leading zeroes on the front if you exceed your initial ‘max length’ number.  (i.e. a 2 digit mag notation prefix will handle up to a 99 digit number)

 

So, 1, 2, 10, 20 would be indexed as 011, 012, 0210, 0220, which will sort ‘alphabetically’ (i.e. LTR sort) in the correct numeric sequence.

 

Bob Sandiford | Principal Engineer SirsiDynix

P: 800.288.8020 X6943 | [hidden email]

www.sirsidynix.com

 

Join the conversation: Like us on Facebook! Follow us on Twitter!

 

From: Igor Motov-3 [via ElasticSearch Users] [hidden email]
Sent: Monday, October 29, 2012 9:58 AM
To: Bob Sandiford
Subject: Re: Sorting a string field numerically

 

I think the idea that Radu outlined might work. But I would propose a few changes to the implementation. First of all, retrieving and parsing source for every single result can be very expensive especially if records have large sources. So, it might be better to replace _source lookup with doc or _field lookup. 

 

The expression "_source.foo is Integer" would work only if "foo" was an integer in the source. In other words, if it was indexed as {"foo": 123} it would work, but if it was indexed as {"foo": "123"}  it wouldn't. In order to handle the latter case, we need to actually parse the string. 

 

To answer, Nick's question, we can simplify handling of non-numeric cases, by switching from custom_score query to sort script . 

 

So, if we combine all these suggestions, we will get something like this:

 

{

  "query" : {

      "match_all": {}

  },

  "sort": {

      "_script":{

          "script" : "s = doc['\''foo'\''].value; n = org.elasticsearch.common.primitives.Ints.tryParse(s); if (n != null) { String.format(\"%010d\",n)} else { s }",

          "type" : "string"

      }

  }

}'

 

That still might be too slow for the large result list. So, an ideal solution here would be to pad integer values with 0 so all integers have the same size during indexing. In other words, do the same thing that the script above is doing but during indexing. 

 

Here is a complete example if somebody wants to play with it: https://gist.github.com/3973641

 


On Monday, October 29, 2012 8:59:26 AM UTC-4, Nick Hoffman wrote:

Hi Radu. That's an interesting idea. How would you score fields that aren't numeric, though?



On Monday, 29 October 2012 05:49:37 UTC-4, Radu Gheorghe wrote:

Hello,

You can use the _source field, if you got it stored, in a script
within a custom_score query. For example, assuming "foo" is the name
of our field:

curl -XPOST localhost:9200/test/test/_search?pretty=true -d '{
    "query": {
        "custom_score": {
            "query": {
                "match_all": {}
            },
            "script": "if (_source.foo is Integer) { _source.foo }
else { _score/5 }"
        }
    }
}'

This will get your numeric values sorted as numbers. You'd need to
fill out the "else" part of the script with whatever values you want
to give to your non-numeric fields.

Best regards,
Radu
--
http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Sun, Oct 28, 2012 at 9:10 PM, Nick Hoffman <[hidden email]> wrote:


> Hah, I asked the exact same question:
> https://groups.google.com/forum/#!topic/elasticsearch/gds3sZKA-SI
>
>
> On Sunday, 28 October 2012 14:45:25 UTC-4, James Hu wrote:
>>
>> I have a field that is string type. Sometimes it will contain integer
>> values such as 1, 2, 10, 20. Currently how it sorts those is 1, 10, 2, 20.
>> How do I go aboutsorting that numerically so that it's 1, 2, 10, 20?
>
> --
>
>

--
 
 


If you reply to this email, your message will be added to the discussion below:

http://elasticsearch-users.115913.n3.nabble.com/Sorting-a-string-field-numerically-tp4024557p4024574.html

To start a new topic under ElasticSearch Users, email [hidden email]
To unsubscribe from ElasticSearch Users, click here.
NAML

Bob.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sorting a string field numerically

Nick Hoffman
In reply to this post by Igor Motov-3
Hi Igor. Your suggestion to do a field lookup is definitely an improvement. How would the rest of your solution work if the field's value contains numeric and non-numeric characters? E.g.
{ "foo" : "hello 30 world" }
{ "foo" : "hello 2 world" }
{ "foo" : "44 bar" }

Thanks,
Nick


On Monday, 29 October 2012 09:57:51 UTC-4, Igor Motov wrote:
I think the idea that Radu outlined might work. But I would propose a few changes to the implementation. First of all, retrieving and parsing source for every single result can be very expensive especially if records have large sources. So, it might be better to replace _source lookup with doc or _field lookup. 

The expression "_source.foo is Integer" would work only if "foo" was an integer in the source. In other words, if it was indexed as {"foo": 123} it would work, but if it was indexed as {"foo": "123"}  it wouldn't. In order to handle the latter case, we need to actually parse the string. 

To answer, Nick's question, we can simplify handling of non-numeric cases, by switching from custom_score query to sort script . 

So, if we combine all these suggestions, we will get something like this:

{
  "query" : {
      "match_all": {}
  },
  "sort": {
      "_script":{
          "script" : "s = doc['\''foo'\''].value; n = org.elasticsearch.common.primitives.Ints.tryParse(s); if (n != null) { String.format(\"%010d\",n)} else { s }",
          "type" : "string"
      }
  }
}'

That still might be too slow for the large result list. So, an ideal solution here would be to pad integer values with 0 so all integers have the same size during indexing. In other words, do the same thing that the script above is doing but during indexing. 

Here is a complete example if somebody wants to play with it: https://gist.github.com/3973641


On Monday, October 29, 2012 8:59:26 AM UTC-4, Nick Hoffman wrote:
Hi Radu. That's an interesting idea. How would you score fields that aren't numeric, though?


On Monday, 29 October 2012 05:49:37 UTC-4, Radu Gheorghe wrote:
Hello,

You can use the _source field, if you got it stored, in a script
within a custom_score query. For example, assuming "foo" is the name
of our field:

curl -XPOST localhost:9200/test/test/_search?pretty=true -d '{
    "query": {
        "custom_score": {
            "query": {
                "match_all": {}
            },
            "script": "if (_source.foo is Integer) { _source.foo }
else { _score/5 }"
        }
    }
}'

This will get your numeric values sorted as numbers. You'd need to
fill out the "else" part of the script with whatever values you want
to give to your non-numeric fields.

Best regards,
Radu
--
http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Sun, Oct 28, 2012 at 9:10 PM, Nick Hoffman <[hidden email]> wrote:

> Hah, I asked the exact same question:
> https://groups.google.com/forum/#!topic/elasticsearch/gds3sZKA-SI
>
>
> On Sunday, 28 October 2012 14:45:25 UTC-4, James Hu wrote:
>>
>> I have a field that is string type. Sometimes it will contain integer
>> values such as 1, 2, 10, 20. Currently how it sorts those is 1, 10, 2, 20.
>> How do I go aboutsorting that numerically so that it's 1, 2, 10, 20?
>
> --
>
>

--
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sorting a string field numerically

Igor Motov-3
In this case, you would have to split the value of the foo field into words, check if each word is an integer and replace it with a number with leading zeros if it is. 

On Monday, October 29, 2012 3:22:47 PM UTC-4, Nick Hoffman wrote:
Hi Igor. Your suggestion to do a field lookup is definitely an improvement. How would the rest of your solution work if the field's value contains numeric and non-numeric characters? E.g.
{ "foo" : "hello 30 world" }
{ "foo" : "hello 2 world" }
{ "foo" : "44 bar" }

Thanks,
Nick


On Monday, 29 October 2012 09:57:51 UTC-4, Igor Motov wrote:
I think the idea that Radu outlined might work. But I would propose a few changes to the implementation. First of all, retrieving and parsing source for every single result can be very expensive especially if records have large sources. So, it might be better to replace _source lookup with doc or _field lookup. 

The expression "_source.foo is Integer" would work only if "foo" was an integer in the source. In other words, if it was indexed as {"foo": 123} it would work, but if it was indexed as {"foo": "123"}  it wouldn't. In order to handle the latter case, we need to actually parse the string. 

To answer, Nick's question, we can simplify handling of non-numeric cases, by switching from custom_score query to sort script . 

So, if we combine all these suggestions, we will get something like this:

{
  "query" : {
      "match_all": {}
  },
  "sort": {
      "_script":{
          "script" : "s = doc['\''foo'\''].value; n = org.elasticsearch.common.primitives.Ints.tryParse(s); if (n != null) { String.format(\"%010d\",n)} else { s }",
          "type" : "string"
      }
  }
}'

That still might be too slow for the large result list. So, an ideal solution here would be to pad integer values with 0 so all integers have the same size during indexing. In other words, do the same thing that the script above is doing but during indexing. 

Here is a complete example if somebody wants to play with it: https://gist.github.com/3973641


On Monday, October 29, 2012 8:59:26 AM UTC-4, Nick Hoffman wrote:
Hi Radu. That's an interesting idea. How would you score fields that aren't numeric, though?


On Monday, 29 October 2012 05:49:37 UTC-4, Radu Gheorghe wrote:
Hello,

You can use the _source field, if you got it stored, in a script
within a custom_score query. For example, assuming "foo" is the name
of our field:

curl -XPOST localhost:9200/test/test/_search?pretty=true -d '{
    "query": {
        "custom_score": {
            "query": {
                "match_all": {}
            },
            "script": "if (_source.foo is Integer) { _source.foo }
else { _score/5 }"
        }
    }
}'

This will get your numeric values sorted as numbers. You'd need to
fill out the "else" part of the script with whatever values you want
to give to your non-numeric fields.

Best regards,
Radu
--
http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Sun, Oct 28, 2012 at 9:10 PM, Nick Hoffman <[hidden email]> wrote:

> Hah, I asked the exact same question:
> https://groups.google.com/forum/#!topic/elasticsearch/gds3sZKA-SI
>
>
> On Sunday, 28 October 2012 14:45:25 UTC-4, James Hu wrote:
>>
>> I have a field that is string type. Sometimes it will contain integer
>> values such as 1, 2, 10, 20. Currently how it sorts those is 1, 10, 2, 20.
>> How do I go aboutsorting that numerically so that it's 1, 2, 10, 20?
>
> --
>
>

--
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sorting a string field numerically

Nick Hoffman
Ah, that's right. That should work quite well. The only question is how inefficient this will be. I searched for documentation on the performance of sort scripts, but couldn't find anything. Do you know what sort of negative impact this would have on performance?


On Monday, 29 October 2012 15:31:11 UTC-4, Igor Motov wrote:
In this case, you would have to split the value of the foo field into words, check if each word is an integer and replace it with a number with leading zeros if it is. 

On Monday, October 29, 2012 3:22:47 PM UTC-4, Nick Hoffman wrote:
Hi Igor. Your suggestion to do a field lookup is definitely an improvement. How would the rest of your solution work if the field's value contains numeric and non-numeric characters? E.g.
{ "foo" : "hello 30 world" }
{ "foo" : "hello 2 world" }
{ "foo" : "44 bar" }

Thanks,
Nick


On Monday, 29 October 2012 09:57:51 UTC-4, Igor Motov wrote:
I think the idea that Radu outlined might work. But I would propose a few changes to the implementation. First of all, retrieving and parsing source for every single result can be very expensive especially if records have large sources. So, it might be better to replace _source lookup with doc or _field lookup. 

The expression "_source.foo is Integer" would work only if "foo" was an integer in the source. In other words, if it was indexed as {"foo": 123} it would work, but if it was indexed as {"foo": "123"}  it wouldn't. In order to handle the latter case, we need to actually parse the string. 

To answer, Nick's question, we can simplify handling of non-numeric cases, by switching from custom_score query to sort script . 

So, if we combine all these suggestions, we will get something like this:

{
  "query" : {
      "match_all": {}
  },
  "sort": {
      "_script":{
          "script" : "s = doc['\''foo'\''].value; n = org.elasticsearch.common.primitives.Ints.tryParse(s); if (n != null) { String.format(\"%010d\",n)} else { s }",
          "type" : "string"
      }
  }
}'

That still might be too slow for the large result list. So, an ideal solution here would be to pad integer values with 0 so all integers have the same size during indexing. In other words, do the same thing that the script above is doing but during indexing. 

Here is a complete example if somebody wants to play with it: https://gist.github.com/3973641


On Monday, October 29, 2012 8:59:26 AM UTC-4, Nick Hoffman wrote:
Hi Radu. That's an interesting idea. How would you score fields that aren't numeric, though?


On Monday, 29 October 2012 05:49:37 UTC-4, Radu Gheorghe wrote:
Hello,

You can use the _source field, if you got it stored, in a script
within a custom_score query. For example, assuming "foo" is the name
of our field:

curl -XPOST localhost:9200/test/test/_search?pretty=true -d '{
    "query": {
        "custom_score": {
            "query": {
                "match_all": {}
            },
            "script": "if (_source.foo is Integer) { _source.foo }
else { _score/5 }"
        }
    }
}'

This will get your numeric values sorted as numbers. You'd need to
fill out the "else" part of the script with whatever values you want
to give to your non-numeric fields.

Best regards,
Radu
--
http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Sun, Oct 28, 2012 at 9:10 PM, Nick Hoffman <[hidden email]> wrote:

> Hah, I asked the exact same question:
> https://groups.google.com/forum/#!topic/elasticsearch/gds3sZKA-SI
>
>
> On Sunday, 28 October 2012 14:45:25 UTC-4, James Hu wrote:
>>
>> I have a field that is string type. Sometimes it will contain integer
>> values such as 1, 2, 10, 20. Currently how it sorts those is 1, 10, 2, 20.
>> How do I go aboutsorting that numerically so that it's 1, 2, 10, 20?
>
> --
>
>

--
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sorting a string field numerically

Clinton Gormley-2
On Mon, 2012-10-29 at 12:36 -0700, Nick Hoffman wrote:
> Ah, that's right. That should work quite well. The only question is
> how inefficient this will be. I searched for documentation on the
> performance of sort scripts, but couldn't find anything. Do you know
> what sort of negative impact this would have on performance?

Doing these things live will never perform as well as precalculating
them and storing the extra data.  You either pay once at index time, or
every time you search.

Personally I would add a field "sort_order" which would precalculate a
value which could be used for a simple string sort.

For instance, "Nexus 7" and "Nexus 10" might be indexed as "nexus
000007" and "nexus 000010", so a simple string sort would give you the
right order

clint


--


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sorting a string field numerically

Igor Motov-3
I agree with Clinton, it will be fairly inefficient especially if you have large result sets since this script will be executed for every single record in your results. The best way to do it is to implement the same logic on the client or as a custom analyzer. 

On Monday, October 29, 2012 3:41:35 PM UTC-4, Clinton Gormley wrote:
On Mon, 2012-10-29 at 12:36 -0700, Nick Hoffman wrote:
> Ah, that's right. That should work quite well. The only question is
> how inefficient this will be. I searched for documentation on the
> performance of sort scripts, but couldn't find anything. Do you know
> what sort of negative impact this would have on performance?

Doing these things live will never perform as well as precalculating
them and storing the extra data.  You either pay once at index time, or
every time you search.

Personally I would add a field "sort_order" which would precalculate a
value which could be used for a simple string sort.

For instance, "Nexus 7" and "Nexus 10" might be indexed as "nexus
000007" and "nexus 000010", so a simple string sort would give you the
right order

clint


--
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sorting a string field numerically

rpsandiford
This post has NOT been accepted by the mailing list yet.
In reply to this post by Axsuul
(Sorry if this ends up being repeated - I originally did a 'reply' from my email and it didn't seem to show up :()

Another index-time possibility is to use a magnitude notation prefix for all the numeric portions.
 
 
 
So – if the number is 1 digit long, prefix it with a ‘1’, if it is two digits, prefix with a ‘2’, and so on.  If you think you’ll exceed 10 digits anywhere, then use a two-digit prefix (i.e. 01 and 02 respectively for a 1 digit and a 2 digit number).  That gets around having long tokens (e.g. a whole bunch of leading zeroes) which might be an issue, and also allows you some more flexibility on how long an integer you might encounter without having to re-index everything to put more leading zeroes on the front if you exceed your initial ‘max length’ number.  (i.e. a 2 digit mag notation prefix will handle up to a 99 digit number)
 
 
 
So, 1, 2, 10, 20 would be indexed as 011, 012, 0210, 0220, which will sort ‘alphabetically’ (i.e. LTR sort) in the correct numeric sequence.
 
 
P.S. – you may also want to consider decimals.  e.g. if “12.5” should sort before “12.12” (for example), then you’ll want to include a magnitude notation following a decimal (i.e. you’d have 312.15 and 312.212).  However, if “12.5” should sort after “12.12”, then you’ll want to avoid placing a magnitude notation prefix on digits following the decimal point (i.e. you’d have 312.5 and 312.12…)
 
Bob Sandiford | Principal Engineer | SirsiDynix
 
P: 800.288.8020 X6943 | [hidden email]
 
www.sirsidynix.com
 
 
 
Join the conversation: Like us on Facebook! Follow us on Twitter!
Bob.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sorting a string field numerically

joergprante@gmail.com
In reply to this post by Axsuul
Hey,

yes, Clinton is of course right, it's more performant to let the Lucene analyzer create sort keys in advance.

An old technique from the era of the first commercial RDMS (back in the 70/80ties, when RAM was tight) is encoding numbers to be sorted with different lengths with a prefix containing the length. This encoding eliminates leading zeroes and allows binary sort.

I just hacked a plugin together with a Lucene token filter. An implementation of a natural sort key can be found at https://github.com/jprante/elasticsearch-analysis-naturalsort

Cheers,

Jörg

On Sunday, October 28, 2012 8:33:58 PM UTC+1, James Hu wrote:
Awesome, thanks for pointing that out! I think leading zeros is a great
solution. I'm now using 10 character length with leading zeros and it works
well. Maybe you should consider using something conservative like 15 so that
it will never surpass that.



--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Sorting-a-string-field-numerically-tp4024557p4024561.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sorting a string field numerically

santhosh
This post has NOT been accepted by the mailing list yet.
This post was updated on .
Hi
  Can I get the result as the number ordered in the array from Elasticsearch Search using a script in ES API?

EX:
arr = [10044,10144,10153,10167,10090,10152, 1,3,200,100,2,10 ...] # 1million entry

From array(arr), i am expecting results as the number ordered in array from Elasticsearch.

I am using below API

GET /test_index_name/_search
{
  "query": {
    "function_score": {
      "query": {
        "constant_score": {
            "query": {
                "bool": {
                    "must": [
                        {
                            "terms": {
                            "number" :  [10044,10144,10153,10167,10090,10152 ....]
                          }},
                        {
                            "query_string": {
                                "query": "keyword",
                                "default_field": "text"
                            }
                        }
                    ]
                }
            }
        }
      }
    }
  },
  "_source": ["number"],
  "size": 6,
  "from": 0
}

http://elasticsearch-users.115913.n3.nabble.com/How-to-get-the-result-based-on-custom-sorting-in-elasticsearch-td4075409.html

How to use the script in above API to get the result as i accepting?

Please help me
 
Loading...