Count distinct value by date

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Count distinct value by date

shammes
I use ElasticSearch for statistical purposes and have recently switched from MySQL to ElasticSearch. My table looks as follows:

datetime | unique_identifier | some more fields...
--------------------------------------------------------
2013-05-01 | abc | ...
2013-05-01 | cde | ...
2013-05-01 | abc | ...
2013-05-01 | abc | ...
2013-05-01 | cde | ...
2013-05-01 | cde | ...
2013-05-01 | abc | ...
2013-05-01 | xyz | ...
2013-05-01 | abc | ...
2013-05-02 | abc | ...
2013-05-02 | cde | ...
2013-05-02 | abc | ...
2013-05-02 | abc | ...
2013-05-02 | cde | ...
2013-05-03 | cde | ...
2013-05-03 | abc | ...
2013-05-03 | xyz | ...
2013-05-03 | abc | ...
2013-05-04 | abc | ...


Now I would like to have listed who many different unique_identifier per day are in the table. Taking the above table as an example, the result would look like this:

2013-05-01 | 3
2013-05-02 | 2
2013-05-03 | 3
2013-05-04 | 1

For this I have always used the following MySQL query:
SELECT DATE (datetime), count (distinct unique_identifier)
FROM tablenname
GROUP BY DATE(datetime);

Unfortunately I could not find the right companion piece to it in ElasticSearch.
Can someone give me a hint?

Thank you very much
Sebastian
Reply | Threaded
Open this post in threaded view
|

Re: Count distinct value by date

Rémy Turpin
Hello,

You must use date histogram facet : http://www.elasticsearch.org/guide/reference/api/search/facets/date-histogram-facet/

{
   
"query" : {
       
"match_all" : {}
   
},
   
"facets" : {
       
"histo1" : {
           
"date_histogram" : {
               
"field" : "datetime",
               
"interval" : "day"
           
}
       
}
   
}
}

You can set other interval, by hour, month, week, etc.

Le lundi 10 juin 2013 15:47:36 UTC+2, shammes a écrit :
I use ElasticSearch for statistical purposes and have recently switched from
MySQL to ElasticSearch. My table looks as follows:

datetime | unique_identifier | some more fields...
--------------------------------------------------------
2013-05-01 | abc | ...
2013-05-01 | cde | ...
2013-05-01 | abc | ...
2013-05-01 | abc | ...
2013-05-01 | cde | ...
2013-05-01 | cde | ...
2013-05-01 | abc | ...
2013-05-01 | xyz | ...
2013-05-01 | abc | ...
2013-05-02 | abc | ...
2013-05-02 | cde | ...
2013-05-02 | abc | ...
2013-05-02 | abc | ...
2013-05-02 | cde | ...
2013-05-03 | cde | ...
2013-05-03 | abc | ...
2013-05-03 | xyz | ...
2013-05-03 | abc | ...
2013-05-04 | abc | ...


Now I would like to have listed who many different unique_identifier per day
are in the table. Taking the above table as an example, the result would
look like this:

2013-05-01 | 3
2013-05-02 | 2
2013-05-03 | 3
2013-05-04 | 1

For this I have always used the following MySQL query:
SELECT DATE (datetime), count (distinct unique_identifier)
FROM tablenname
GROUP BY DATE(datetime);

Unfortunately I could not find the right companion piece to it in
ElasticSearch.
Can someone give me a hint?

Thank you very much
Sebastian



--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Count-distinct-value-by-date-tp4036320.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Count distinct value by date

shammes
In reply to this post by shammes
Hi,

thanks for your reply.
I know the date_histogram-facet, but this only counts (for example per day) the number of entries or when you set the "value_field" the numeric value of this field.

But i need a distinct count-value. In my example i need the total, how many different "unique_identifier" per day exists.

Your code snippet would have following result:
2013-05-01 | 9
2013-05-02 | 5
2013-05-03 | 4
2013-05-04 | 1

The result i am looking for should look like:
2013-05-01 | 3
2013-05-02 | 2
2013-05-03 | 3
2013-05-04 | 1

Do you have any other hint for me?
Reply | Threaded
Open this post in threaded view
|

Re: Count distinct value by date

Rémy Turpin
Oh OK sorry, I think I have the same problem.
I'm interesting by the reply.

Le mardi 11 juin 2013 16:03:59 UTC+2, shammes a écrit :
Hi,

thanks for your reply.
I know the date_histogram-facet, but this only counts (for example per day)
the number of entries or when you set the "value_field" the numeric value of
this field.

But i need a distinct count-value. In my example i need the total, how many
different "unique_identifier" per day exists.

Your code snippet would have following result:
2013-05-01 | 9
2013-05-02 | 5
2013-05-03 | 4
2013-05-04 | 1

The result i am looking for should look like:
2013-05-01 | 3
2013-05-02 | 2
2013-05-03 | 3
2013-05-04 | 1

Do you have any other hint for me?



--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Count-distinct-value-by-date-tp4036320p4036361.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Count distinct value by date

q42jaap
Hi Shammes,

In 1.0 there might be some changes to the facet system that allows to nest facets. I think
might be something to look in to, however, the plugin is not compatible with 0.90, so it might be difficult to get it to work.

For now I don't see how to do this, but maybe Boaz can explain it better?

Jaap

On Tuesday, June 11, 2013 5:38:38 PM UTC+2, Rémy Turpin wrote:
Oh OK sorry, I think I have the same problem.
I'm interesting by the reply.

Le mardi 11 juin 2013 16:03:59 UTC+2, shammes a écrit :
Hi,

thanks for your reply.
I know the date_histogram-facet, but this only counts (for example per day)
the number of entries or when you set the "value_field" the numeric value of
this field.

But i need a distinct count-value. In my example i need the total, how many
different "unique_identifier" per day exists.

Your code snippet would have following result:
2013-05-01 | 9
2013-05-02 | 5
2013-05-03 | 4
2013-05-04 | 1

The result i am looking for should look like:
2013-05-01 | 3
2013-05-02 | 2
2013-05-03 | 3
2013-05-04 | 1

Do you have any other hint for me?



--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Count-distinct-value-by-date-tp4036320p4036361.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Count distinct value by date

Boaz Leskes
Hi Shammes, Remy, Jaap,

You could indeed use the faceted-date-histogram with an inlined term facet (with size=0 or 1) to get that information. The faceted dated histogram allows you to first group posting by date and then apply an arbitrary facet to every group. 

As Jaap already pointed out, the plugin is not compatible with version 0.90.0 and up of Elasticsearch. Version 0.90.0 came with a complete re-write of the internal in memory data structures that drive the faceting engine. That rewrite delivered a tremendous amount of memory savings so I highly recommend using it. Sadly, it also rendered my plugin to be incompatible and I simply didn't have the time to re-write things. If enough people need it, I might find some time to do it and make a 0.90.X compatible version (dropping other features like the hashed terms facet, which is simply incompatible). Of course, pull requests are welcome :)

About the 1.0 version of ES - we are currently working on a new powerful faceting engine that will allow to do this and much more by allow to nest facets. It will take a couple of month, though, before it's ready. 

Cheers,
Boaz


On Tuesday, June 11, 2013 11:01:35 PM UTC+2, Jaap Taal wrote:
Hi Shammes,

In 1.0 there might be some changes to the facet system that allows to nest facets. I think
might be something to look in to, however, the plugin is not compatible with 0.90, so it might be difficult to get it to work.

For now I don't see how to do this, but maybe Boaz can explain it better?

Jaap

On Tuesday, June 11, 2013 5:38:38 PM UTC+2, Rémy Turpin wrote:
Oh OK sorry, I think I have the same problem.
I'm interesting by the reply.

Le mardi 11 juin 2013 16:03:59 UTC+2, shammes a écrit :
Hi,

thanks for your reply.
I know the date_histogram-facet, but this only counts (for example per day)
the number of entries or when you set the "value_field" the numeric value of
this field.

But i need a distinct count-value. In my example i need the total, how many
different "unique_identifier" per day exists.

Your code snippet would have following result:
2013-05-01 | 9
2013-05-02 | 5
2013-05-03 | 4
2013-05-04 | 1

The result i am looking for should look like:
2013-05-01 | 3
2013-05-02 | 2
2013-05-03 | 3
2013-05-04 | 1

Do you have any other hint for me?



--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Count-distinct-value-by-date-tp4036320p4036361.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.