How to find the number of authors who have written between 2-3 books?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

How to find the number of authors who have written between 2-3 books?

Mike
Assume each document is a book:  
{ title: "A", author: "Mike" }
{ title: "B", author: "Mike" }
{ title: "C", author: "Mike" }
{ title: "D", author: "Mike" }

{ title: "E", author: "John" }
{ title: "F", author: "John" }
{ title: "G", author: "John" }

{ title: "H", author: "Joe" }
{ title: "I", author: "Joe" }

{ title: "J", author: "Jack" }


What is the best way to fin the number of authors who have written between 2-3 books?  In this case it would be 2, John and Joe.

I know I can do a terms aggregation on author, set size to be very very large, and then on the client side traverse through the thousands of authors and count how many had between 2-3.  Is there a more efficient way to do this?  The cardinality aggregation is almost what I want, if only I could specify a min and max term count. 


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/22fc4e6d-bcac-426c-a343-ff1d36fc25de%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: How to find the number of authors who have written between 2-3 books?

Itamar Syn-Hershko
This is a Map/Reduce operation, you'll be better off maintaining a ref-count document IMO then trying to hack the aggregations framework to support this

Another reason for doing it that way is in a distributed environment some aggregations can't be computed to an exact value - the Terms bucketing is one example. So if you need exact values, I'd go for a model that does it.

--

Itamar Syn-Hershko
http://code972.com | @synhershko
Freelance Developer & Consultant


On Fri, Jun 20, 2014 at 1:34 AM, Mike <[hidden email]> wrote:
Assume each document is a book:  
{ title: "A", author: "Mike" }
{ title: "B", author: "Mike" }
{ title: "C", author: "Mike" }
{ title: "D", author: "Mike" }

{ title: "E", author: "John" }
{ title: "F", author: "John" }
{ title: "G", author: "John" }

{ title: "H", author: "Joe" }
{ title: "I", author: "Joe" }

{ title: "J", author: "Jack" }


What is the best way to fin the number of authors who have written between 2-3 books?  In this case it would be 2, John and Joe.

I know I can do a terms aggregation on author, set size to be very very large, and then on the client side traverse through the thousands of authors and count how many had between 2-3.  Is there a more efficient way to do this?  The cardinality aggregation is almost what I want, if only I could specify a min and max term count. 


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/22fc4e6d-bcac-426c-a343-ff1d36fc25de%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zv5%3DmuahwGVbGobX5SgMHYzC_bD4udiZ3XTiAdU1v8YCg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: How to find the number of authors who have written between 2-3 books?

Mike
I'm ok with the count returned being some estimate.  Say in this simple example if it returned 1 for just Joe, or 3 for John, Joe, and Jack that would be ok too.  I am also ok with restructuring my data in any way to more efficiently get this number.  

You mentioned creating a reference count document.  How would that look?  1 doc per unique author, with a count of the total number of books he wrote so then I can do a range aggregation on that number?  What if I wanted to find "the number of authors who have written between 2-3 books that have a title containing E, F, H, or I" (still 2 in this case, John and Joe) ?  



On Thursday, June 19, 2014 6:43:41 PM UTC-4, Itamar Syn-Hershko wrote:
This is a Map/Reduce operation, you'll be better off maintaining a ref-count document IMO then trying to hack the aggregations framework to support this

Another reason for doing it that way is in a distributed environment some aggregations can't be computed to an exact value - the Terms bucketing is one example. So if you need exact values, I'd go for a model that does it.

--

Itamar Syn-Hershko
<a href="http://code972.com/" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fcode972.com%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNH_0ahlIREvy79st9arcLSClMBpEw';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fcode972.com%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNH_0ahlIREvy79st9arcLSClMBpEw';return true;">http://code972.com | <a href="https://twitter.com/synhershko" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Ftwitter.com%2Fsynhershko\46sa\75D\46sntz\0751\46usg\75AFQjCNGBL9AV5Pm4wDx-6dKwWnd_Vfn1gQ';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Ftwitter.com%2Fsynhershko\46sa\75D\46sntz\0751\46usg\75AFQjCNGBL9AV5Pm4wDx-6dKwWnd_Vfn1gQ';return true;">@synhershko
Freelance Developer & Consultant
Author of <a href="http://manning.com/synhershko/" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fmanning.com%2Fsynhershko%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNEtS9J7IelY2CGG_5cda5-SPQNhpQ';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fmanning.com%2Fsynhershko%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNEtS9J7IelY2CGG_5cda5-SPQNhpQ';return true;">RavenDB in Action


On Fri, Jun 20, 2014 at 1:34 AM, Mike <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="hErTww6AlnEJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">mnilss...@...> wrote:
Assume each document is a book:  
{ title: "A", author: "Mike" }
{ title: "B", author: "Mike" }
{ title: "C", author: "Mike" }
{ title: "D", author: "Mike" }

{ title: "E", author: "John" }
{ title: "F", author: "John" }
{ title: "G", author: "John" }

{ title: "H", author: "Joe" }
{ title: "I", author: "Joe" }

{ title: "J", author: "Jack" }


What is the best way to fin the number of authors who have written between 2-3 books?  In this case it would be 2, John and Joe.

I know I can do a terms aggregation on author, set size to be very very large, and then on the client side traverse through the thousands of authors and count how many had between 2-3.  Is there a more efficient way to do this?  The cardinality aggregation is almost what I want, if only I could specify a min and max term count. 


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="hErTww6AlnEJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/22fc4e6d-bcac-426c-a343-ff1d36fc25de%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/22fc4e6d-bcac-426c-a343-ff1d36fc25de%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/22fc4e6d-bcac-426c-a343-ff1d36fc25de%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/22fc4e6d-bcac-426c-a343-ff1d36fc25de%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2cab8d84-7c65-4f6e-ab39-3e2a0e859a87%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: How to find the number of authors who have written between 2-3 books?

Clinton Gormley-2
Alternatively, if you mode this with parent-child, then you can use min_children/max_children which is available in the next release 


clint


On 20 June 2014 17:15, Mike <[hidden email]> wrote:
I'm ok with the count returned being some estimate.  Say in this simple example if it returned 1 for just Joe, or 3 for John, Joe, and Jack that would be ok too.  I am also ok with restructuring my data in any way to more efficiently get this number.  

You mentioned creating a reference count document.  How would that look?  1 doc per unique author, with a count of the total number of books he wrote so then I can do a range aggregation on that number?  What if I wanted to find "the number of authors who have written between 2-3 books that have a title containing E, F, H, or I" (still 2 in this case, John and Joe) ?  




On Thursday, June 19, 2014 6:43:41 PM UTC-4, Itamar Syn-Hershko wrote:
This is a Map/Reduce operation, you'll be better off maintaining a ref-count document IMO then trying to hack the aggregations framework to support this

Another reason for doing it that way is in a distributed environment some aggregations can't be computed to an exact value - the Terms bucketing is one example. So if you need exact values, I'd go for a model that does it.

--

Itamar Syn-Hershko
http://code972.com | @synhershko
Freelance Developer & Consultant


On Fri, Jun 20, 2014 at 1:34 AM, Mike <[hidden email]> wrote:
Assume each document is a book:  
{ title: "A", author: "Mike" }
{ title: "B", author: "Mike" }
{ title: "C", author: "Mike" }
{ title: "D", author: "Mike" }

{ title: "E", author: "John" }
{ title: "F", author: "John" }
{ title: "G", author: "John" }

{ title: "H", author: "Joe" }
{ title: "I", author: "Joe" }

{ title: "J", author: "Jack" }


What is the best way to fin the number of authors who have written between 2-3 books?  In this case it would be 2, John and Joe.

I know I can do a terms aggregation on author, set size to be very very large, and then on the client side traverse through the thousands of authors and count how many had between 2-3.  Is there a more efficient way to do this?  The cardinality aggregation is almost what I want, if only I could specify a min and max term count. 


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2cab8d84-7c65-4f6e-ab39-3e2a0e859a87%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPt3XKSyio7izuxr5UL4SD5uiA5J7rwtfyP742W3robxfk7s6A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.