Using aggregations for OLAP

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Using aggregations for OLAP

Roy Jacobs
I am interested in using the new aggregations support to implement something similar to an OLAP cube.

Let's say I have a big bunch of documents that represent orders. On those documents I want to calculate a bunch of metrics (using the "metric" aggregation) based on various fields. Stuff like "# of items". Then, I want to group this (using the "bucket" aggregation) based on brand, for instance. All of this is multi-tenant as well, so I need to filter out a whole lot of irrelevant data for every query.

The amount of documents is quite high (hundreds of millions) so I was wondering if aggregations have any form of caching or precalculation, or if they have to traverse the entire index every time I do a query. This could also be quite prohibitive memory-wise.

Has anyone been using aggregations in this manner?

Roy

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/08487eb4-5a1e-4e30-a873-8d0623d4f355%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Using aggregations for OLAP

davrob
I'm not sure what the 'official' elasticsearch view on this is but, to me, from day 1, elasticsearch has had the capability to do everything that OLAP cubes can do, in a lightweight, agile way.  Creating dimensions in cubes is the same as pre-indexing calculated fields in the index 

e.g. week: 6, ,Month: 2, Quarter: 1, year: 2013, decade: second, century: 21st etc. are effectively dimensions calculated from a single date fact: 7th February 2013

The aggregations framework adds an immense amount of power: flexible aggregations on top of fast search and powerful sorting capabilities, is a pretty amazing package for business analytics,  without any of the hype and expense typically associated with OLAP and Business Intelligence.

I guess time will tell on the performance front, but I'm quite optimistic, in the end aggregations and facets are just big in-memory map-reduce jobs - if you pre-calculate a lot of the dimensions you are interested in, rather than relying on scripts, you should get pretty decent performance.

-David.

On Monday, 20 January 2014 15:17:49 UTC, Roy Jacobs wrote:
I am interested in using the new aggregations support to implement something similar to an OLAP cube.

Let's say I have a big bunch of documents that represent orders. On those documents I want to calculate a bunch of metrics (using the "metric" aggregation) based on various fields. Stuff like "# of items". Then, I want to group this (using the "bucket" aggregation) based on brand, for instance. All of this is multi-tenant as well, so I need to filter out a whole lot of irrelevant data for every query.

The amount of documents is quite high (hundreds of millions) so I was wondering if aggregations have any form of caching or precalculation, or if they have to traverse the entire index every time I do a query. This could also be quite prohibitive memory-wise.

Has anyone been using aggregations in this manner?

Roy

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c1893874-02bb-4a02-b019-5c7f31189b0d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.