Sorting on a date field

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Sorting on a date field

John Chang
I need to query docs, having results sorted by a date field.

Before using Elastic Search, I cast my date (long of millis since epoc) to a float and then sorted on that.  I did the cast because I read that Lucene could not sort on a long.  There was a loss of precision, obviously, and if the docs were too close in time, the ordering might be off, which we tolerated.  When we switched to Elastic Search, we did the same thing, doing a script on the timestamp saved as a float; again it worked with tolerable some loss of precision.

Now I want to improve the precision.  In a different thread you advised me to to save the date as a long and the do a script on that.  I find it also generally works but also lacks precision when the dates get too close (generally under 3 minutes or so apart).   This is what I did (sortField referenced a long):

response = indexClient.search(Requests.searchRequest(getIndexName()).types(documentTypes).searchType(SearchType.QUERY_THEN_FETCH).source(SearchSourceBuilder.searchSource().query(QueryBuilders.customScoreQuery(QueryBuilders.queryString(query)).script("doc['" + sortField + "'].value")).fields(fields).from(offset).size(max).explain(true))).actionGet();

I also tried storing the field as a string and doing a sort on that.  It worked and the precision was better, but I could not get the sort order param to work  -- I get the same results whether I user SortOrder.ASC or DESC.  This is what I did (sortField referenced a string):

response = indexClient.search(Requests.searchRequest(getIndexName()).types(documentTypes).searchType(SearchType.DFS_QUERY_THEN_FETCH).source(SearchSourceBuilder.searchSource().query(QueryBuilders.queryString(query)).fields(fields).from(offset).size(max).explain(true).sort(sortField, SortOrder.DESC))).actionGet();

1) Is loss of precision doing a script on a long expected? Is there anything I can do to improve precision?

2) If I wind up doing a sort on a string, how can I get the sort param to work?

3) I understand a sort is slower than a script; how much worse is this expected to be?

BTW #1: I know a script will change the score and a sort will not.  I'm not too worried about that.  
BTW #2: I need to use the from/size params for pagination; not sure if that impacts this decision.

Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: Sorting on a date field

John Chang
Perhaps a little more context might help:

Before I tolerated the loss of precision because my search was not paginated.  I just returned the top N docs, starting at 0.  I could tolerate docs a little out of order or do my own second sort on the result set to get it perfect.

Now I am paginating.  So, if I got 30 docs close in time, and my page size is 10, my pages will be wierd; the same doc can show up on pages 1, 2, and 3, and some docs might now show up on any.

Thanks again.
Reply | Threaded
Open this post in threaded view
|

Re: Sorting on a date field

ppearcy
Hey John,
  Maybe I am missing some context, as you refer to a previous thread,
but I'm sorting on date fields with no problem and they appear to have
a 1 second granularity.

Thanks,
Paul

On Nov 9, 5:52 pm, John Chang <[hidden email]> wrote:

> Perhaps a little more context might help:
>
> Before I tolerated the loss of precision because my search was not
> paginated.  I just returned the top N docs, starting at 0.  I could tolerate
> docs a little out of order or do my own second sort on the result set to get
> it perfect.
>
> Now I am paginating.  So, if I got 30 docs close in time, and my page size
> is 10, my pages will be wierd; the same doc can show up on pages 1, 2, and
> 3, and some docs might now show up on any.
>
> Thanks again.
> --
> View this message in context:http://elasticsearch-users.115913.n3.nabble.com/Sorting-on-a-date-fie...
> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Sorting on a date field

kimchy
Administrator
Hi,

   There  is no problem to sort on long (in Lucene or ElasticSearch). You can simply index a long value up to 1 millisecond resolution (for example), and then sort on it. The `date` type in elasticsearch actually indexes a long value, parsing the relevant date to a long. The resolution of the long value is based on the date string passed.

   When you do sort, don't use scripting based sorting for that, just add the field to be sorted, its type will be autodetected and the proper sorting will be done.

-shay.banon

On Wed, Nov 10, 2010 at 8:56 AM, Paul <[hidden email]> wrote:
Hey John,
 Maybe I am missing some context, as you refer to a previous thread,
but I'm sorting on date fields with no problem and they appear to have
a 1 second granularity.

Thanks,
Paul

On Nov 9, 5:52 pm, John Chang <[hidden email]> wrote:
> Perhaps a little more context might help:
>
> Before I tolerated the loss of precision because my search was not
> paginated.  I just returned the top N docs, starting at 0.  I could tolerate
> docs a little out of order or do my own second sort on the result set to get
> it perfect.
>
> Now I am paginating.  So, if I got 30 docs close in time, and my page size
> is 10, my pages will be wierd; the same doc can show up on pages 1, 2, and
> 3, and some docs might now show up on any.
>
> Thanks again.
> --
> View this message in context:http://elasticsearch-users.115913.n3.nabble.com/Sorting-on-a-date-fie...
> Sent from the ElasticSearch Users mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Sorting on a date field

John Chang
I'm afraid I must not be understanding your advice.  I guess I need more specifics about how you want me to (A) map, (B) index and (C) search the doc.

I read your response and then....

I tried mapping the field this way:
            "receivedDate" : {
                "index_name":	"receivedDate",
                "type":	"date",
                "index":	"analyzed",
                "store":	"yes",
                "term_vector":	"no",
                "boost":	1.0,
                "omit_norms":	"false",
                "omit_term_freq_and_positions":	"false"
            }

I then tried these combinations (data.getReceivedDateInUTC() returns a java.util.Date):

1)
Map<String, Object> document = new HashMap<String, Object>();
document.put("receivedDate", data.getReceivedDateInUTC()); 

When I searched on the doc with a script:
  .script("doc['" + sortField + "'].value")
it worked, but lacked precision with dates close in time (under ~ 3 min).

1.1)
I indexed the doc as in 1 above, and I searched on the doc with a sort:
  .sort(sortField)
I got SearchPhaseExecutionException

2)
Map<String, Object> document = new HashMap<String, Object>();
document.put("receivedDate", data.getReceivedDateInUTC().getTime()); 

I could not index the doc; I got:
java.lang.IllegalArgumentException: Invalid format: "1204351200000" is malformed at "0000"

Didn't expect this to work, but was taking a guess because nothing else did.

3)
Map<String, Object> document = new HashMap<String, Object>();
document.put("receivedDate", Long.toString(data.getReceivedDateInUTC().getTime())); 

I could not index the doc; I got:
java.lang.IllegalArgumentException: Invalid format: "1204351200000" is malformed at "0000"
(same as #2 above)

Your response mentioned a date string: "The resolution of the long value is based on the date string passed." Hence this test.

4)
Map<String, Object> document = new HashMap<String, Object>();
document.put("receivedDate", data.getReceivedDateInUTC().toString()); 

I could not index the doc; I got:
org.elasticsearch.index.mapper.MapperParsingException: Failed to parse [receivedDate]

As in 3 above, this was a test based on my understanding of a date string from your response.

Thanks for your help.
Reply | Threaded
Open this post in threaded view
|

Re: Sorting on a date field

kimchy
Administrator
You have two options, first is to have a numeric (long) type for the field, second is to have a date type, which expects (by default) an ISO formatted date string.

If you use the Java API, you have two options:

1. If you want the field to be numeric, then just add the "milliseconds since epoch" value (Date.getTime()) to your Map of values.
2. If you want the field to be a date type, then either provide the formatted string yourself, or, pass a Date instance as the value of the field, it will automatically be formatted to its ISO format and indexed.

In both options, you don't need to explicitly set the mappings. It will be auto detected. I say, since you already have a Date object, to index it (which will, in turn, be formatted as ISO formatted string, and detected as such in the date type).

Once you have index it, you will be able to sort it by just adding it as sort field in the search API.

Regarding the resolution, I am not sure I understand what you mean by missing resolution, but make sure the data you index does actually offer the resolution you expected (the Date instance you get, does it have that resolution?).

-shay.banon

On Thu, Nov 11, 2010 at 12:27 AM, John Chang <[hidden email]> wrote:

I'm afraid I must not be understanding your advice.  I guess I need more
specifics about how you want me to (A) map, (B) index and (C) search the
doc.

I read your response and then....

I tried mapping the field this way:
           "receivedDate" : {
               "index_name":   "receivedDate",
               "type": "date",
               "index":        "analyzed",
               "store":        "yes",
               "term_vector":  "no",
               "boost":        1.0,
               "omit_norms":   "false",
               "omit_term_freq_and_positions": "false"
           }


I then tried these combinations (data.getReceivedDateInUTC() returns a
java.util.Date):

1)
Map<String, Object> document = new HashMap<String, Object>();
document.put("receivedDate", data.getReceivedDateInUTC());

When I searched on the doc with a script:
 .script("doc['" + sortField + "'].value")
it worked, but lacked precision with dates close in time (under ~ 3 min).

1.1)
I indexed the doc as in 1 above, and I searched on the doc with a sort:
 .sort(sortField)
I got SearchPhaseExecutionException

2)
Map<String, Object> document = new HashMap<String, Object>();
document.put("receivedDate", data.getReceivedDateInUTC().getTime());

I could not index the doc; I got:
java.lang.IllegalArgumentException: Invalid format: "1204351200000" is
malformed at "0000"

Didn't expect this to work, but was taking a guess because nothing else did.

3)
Map<String, Object> document = new HashMap<String, Object>();
document.put("receivedDate",
Long.toString(data.getReceivedDateInUTC().getTime()));

I could not index the doc; I got:
java.lang.IllegalArgumentException: Invalid format: "1204351200000" is
malformed at "0000"
(same as #2 above)

Your response mentioned a date string: "The resolution of the long value is
based on the date string passed." Hence this test.

4)
Map<String, Object> document = new HashMap<String, Object>();
document.put("receivedDate", data.getReceivedDateInUTC().toString());

I could not index the doc; I got:
org.elasticsearch.index.mapper.MapperParsingException: Failed to parse
[receivedDate]

As in 3 above, this was a test based on my understanding of a date string
from your response.

Thanks for your help.
--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Sorting-on-a-date-field-tp1873287p1879190.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Sorting on a date field

John Chang
Thanks!  I got my test working, mapping the field as a date.  My problem before is that I was using a QueryBuilders.customScoreQuery with a sort, which produced the error.  I got rid of the customScoreQuery part and the sort worked great.

Unfortunately, my production dates are not mapped as dates; I foolishly mapped the long values as  type string.  Until I can get to reindexing the dates as date fields, is there any way I can get the sorting to work with precision having the longs indexed as strings?  I'll need to be able to sort them both ascending and descending.  The string value is mapped as index=no, store=yes in case that makes a difference.  

We don't have a way of reindexing quickly here - we are working on that.

Thanks again!