Calculating rolling average using aggregations

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Calculating rolling average using aggregations

Simon Cast
Hi,

I'm looking at using the new aggregations module of Elasticsearch to produce some metrics for my application.

I want to calculate the rolling average for every hour in the last 28 days and then plot these on a chart. I can see how this could be done using a query per hour for the last 28 days. This seems problematic in terms of load and performance. Can this be done using a single query?

Regards,

Simon


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d0a2dba7-f5a5-4944-ae0e-c6771b43ad6c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Calculating rolling average using aggregations

Clinton Gormley-2
Yes, easily.  Aggregations are really powerful.  Here's an example:

# First insert some data
{
  "created": "2014/03/10 12:05:00",
  "somefield": 10
}'

{
  "created": "2014/03/10 12:05:00",
  "somefield": 5
}'

{
  "created": "2014/03/9 12:05:00",
  "somefield": 5
}'


# Then return all results in the last 28 days
# and calculate the avg per hour
{
  "query": {
    "range": {
      "created": {
        "gte": "now-28d/d",
        "to": "now"
      }
    }
  },
  "aggs": {
    "per_hour": {
      "date_histogram": {
        "field": "created",
        "interval": "hour"
      },
      "aggs": {
        "rolling_avg": {
          "avg": {
            "field": "somefield"
          }
        }
      }
    }
  }
}'



On 11 March 2014 09:26, Simon Cast <[hidden email]> wrote:
Hi,

I'm looking at using the new aggregations module of Elasticsearch to produce some metrics for my application.

I want to calculate the rolling average for every hour in the last 28 days and then plot these on a chart. I can see how this could be done using a query per hour for the last 28 days. This seems problematic in terms of load and performance. Can this be done using a single query?

Regards,

Simon


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d0a2dba7-f5a5-4944-ae0e-c6771b43ad6c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPt3XKS-tpSZMwM8WVBuXg%3Dez618xbTnCSTXAkWGrFsSxmdnew%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Calculating rolling average using aggregations

Simon Cast
Thanks Clinton.

That looks interesting but that creates an average of values per hour for the last 28 days. What I am looking for more precisely is to calculate a 28 rolling (or moving average) using the last 28 days of data and redoing that calculation every hour.

I suppose what could be done is to do an average on a field that is a hourly count.



On Tuesday, March 11, 2014 11:35:40 AM UTC, Clinton Gormley wrote:
Yes, easily.  Aggregations are really powerful.  Here's an example:

# First insert some data
curl -XPUT "<a href="http://localhost:9200/myindex/mytype/1" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Flocalhost%3A9200%2Fmyindex%2Fmytype%2F1\46sa\75D\46sntz\0751\46usg\75AFQjCNFVblE8aFwIRFDwuFmR-XJtg5u44w';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Flocalhost%3A9200%2Fmyindex%2Fmytype%2F1\46sa\75D\46sntz\0751\46usg\75AFQjCNFVblE8aFwIRFDwuFmR-XJtg5u44w';return true;">http://localhost:9200/myindex/mytype/1" -d'
{
  "created": "2014/03/10 12:05:00",
  "somefield": 10
}'

curl -XPUT "<a href="http://localhost:9200/myindex/mytype/2" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Flocalhost%3A9200%2Fmyindex%2Fmytype%2F2\46sa\75D\46sntz\0751\46usg\75AFQjCNF2TTSajA5TFyA9QyZ2-VObkdy0GQ';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Flocalhost%3A9200%2Fmyindex%2Fmytype%2F2\46sa\75D\46sntz\0751\46usg\75AFQjCNF2TTSajA5TFyA9QyZ2-VObkdy0GQ';return true;">http://localhost:9200/myindex/mytype/2" -d'
{
  "created": "2014/03/10 12:05:00",
  "somefield": 5
}'

curl -XPUT "<a href="http://localhost:9200/myindex/mytype/3" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Flocalhost%3A9200%2Fmyindex%2Fmytype%2F3\46sa\75D\46sntz\0751\46usg\75AFQjCNGDwwdWcblHsC_6UcoIRkAAMd1OPQ';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Flocalhost%3A9200%2Fmyindex%2Fmytype%2F3\46sa\75D\46sntz\0751\46usg\75AFQjCNGDwwdWcblHsC_6UcoIRkAAMd1OPQ';return true;">http://localhost:9200/myindex/mytype/3" -d'
{
  "created": "2014/03/9 12:05:00",
  "somefield": 5
}'


# Then return all results in the last 28 days
# and calculate the avg per hour
curl -XGET "<a href="http://localhost:9200/myindex/mytype/_search" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Flocalhost%3A9200%2Fmyindex%2Fmytype%2F_search\46sa\75D\46sntz\0751\46usg\75AFQjCNFUF4VJJ5___af06y7Ii0h2rxfFfA';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Flocalhost%3A9200%2Fmyindex%2Fmytype%2F_search\46sa\75D\46sntz\0751\46usg\75AFQjCNFUF4VJJ5___af06y7Ii0h2rxfFfA';return true;">http://localhost:9200/myindex/mytype/_search" -d'
{
  "query": {
    "range": {
      "created": {
        "gte": "now-28d/d",
        "to": "now"
      }
    }
  },
  "aggs": {
    "per_hour": {
      "date_histogram": {
        "field": "created",
        "interval": "hour"
      },
      "aggs": {
        "rolling_avg": {
          "avg": {
            "field": "somefield"
          }
        }
      }
    }
  }
}'



On 11 March 2014 09:26, Simon Cast <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="W3n10mGqvRkJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">simon...@...> wrote:
Hi,

I'm looking at using the new aggregations module of Elasticsearch to produce some metrics for my application.

I want to calculate the rolling average for every hour in the last 28 days and then plot these on a chart. I can see how this could be done using a query per hour for the last 28 days. This seems problematic in terms of load and performance. Can this be done using a single query?

Regards,

Simon


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="W3n10mGqvRkJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/d0a2dba7-f5a5-4944-ae0e-c6771b43ad6c%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/d0a2dba7-f5a5-4944-ae0e-c6771b43ad6c%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/d0a2dba7-f5a5-4944-ae0e-c6771b43ad6c%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/d0a2dba7-f5a5-4944-ae0e-c6771b43ad6c%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/15324eca-2048-4a57-abe4-f1730e1bf70e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Calculating rolling average using aggregations

Clinton Gormley-2
Forgive my ignorance but could you provide a worked example of what you mean? I'm not getting exactly what you're after.


On 12 March 2014 13:24, Simon Cast <[hidden email]> wrote:
Thanks Clinton.

That looks interesting but that creates an average of values per hour for the last 28 days. What I am looking for more precisely is to calculate a 28 rolling (or moving average) using the last 28 days of data and redoing that calculation every hour.

I suppose what could be done is to do an average on a field that is a hourly count.



On Tuesday, March 11, 2014 11:35:40 AM UTC, Clinton Gormley wrote:
Yes, easily.  Aggregations are really powerful.  Here's an example:

# First insert some data
{
  "created": "2014/03/10 12:05:00",
  "somefield": 10
}'

{
  "created": "2014/03/10 12:05:00",
  "somefield": 5
}'

{
  "created": "2014/03/9 12:05:00",
  "somefield": 5
}'


# Then return all results in the last 28 days
# and calculate the avg per hour
{
  "query": {
    "range": {
      "created": {
        "gte": "now-28d/d",
        "to": "now"
      }
    }
  },
  "aggs": {
    "per_hour": {
      "date_histogram": {
        "field": "created",
        "interval": "hour"
      },
      "aggs": {
        "rolling_avg": {
          "avg": {
            "field": "somefield"
          }
        }
      }
    }
  }
}'



On 11 March 2014 09:26, Simon Cast <[hidden email]> wrote:
Hi,

I'm looking at using the new aggregations module of Elasticsearch to produce some metrics for my application.

I want to calculate the rolling average for every hour in the last 28 days and then plot these on a chart. I can see how this could be done using a query per hour for the last 28 days. This seems problematic in terms of load and performance. Can this be done using a single query?

Regards,

Simon


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/15324eca-2048-4a57-abe4-f1730e1bf70e%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPt3XKRFsY_F-dJdp2vN_yLOtHZ1-brdE%2BvxN7ATdapf4rnOow%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Calculating rolling average using aggregations

Binh Ly-2
Unfortunately a MA aggregation is not available at the moment (unless you accumulate/include all the prior data that you need into every doc, or you run multiple queries and aggregate it yourself - neither of which is desirable or flexible)

Clint, a MA is an average computed in a "sliding window" fashion, where the time interval slides as the average is computed. So for example, let's say I have a daily statistic for 5 days:

[Day1 = 100, Day 2 = 105, Day 3 = 101, Day 4 = 120, Day 5 = 200]

A 2 day in-the-past MA would be computed as follows:

Day1 = No data yet (since we don't have 2 days in-the-past worth of info)
Day2 = (Day1 + Day2) / 2 = (100 + 105) / 2
Day3 = (Day2 + Day3) / 2 = (105 + 101) / 2
...




--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/89d909af-377e-44bd-8cec-31249f9f1d60%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Calculating rolling average using aggregations

Clinton Gormley-2
Heya Bihn

The part I'm not getting is this: "the rolling average for every hour in the last 28 days". ie what period should each bucket/rolling avg cover? an hour? 28 days?

You can still do rolling averages with aggregations, but they require a bit more work. I wanted to get the exact specs before trying to answer again :)

clint



On 12 March 2014 16:09, Binh Ly <[hidden email]> wrote:
Unfortunately a MA aggregation is not available at the moment (unless you accumulate/include all the prior data that you need into every doc, or you run multiple queries and aggregate it yourself - neither of which is desirable or flexible)

Clint, a MA is an average computed in a "sliding window" fashion, where the time interval slides as the average is computed. So for example, let's say I have a daily statistic for 5 days:

[Day1 = 100, Day 2 = 105, Day 3 = 101, Day 4 = 120, Day 5 = 200]

A 2 day in-the-past MA would be computed as follows:

Day1 = No data yet (since we don't have 2 days in-the-past worth of info)
Day2 = (Day1 + Day2) / 2 = (100 + 105) / 2
Day3 = (Day2 + Day3) / 2 = (105 + 101) / 2
...




--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/89d909af-377e-44bd-8cec-31249f9f1d60%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPt3XKSkaL59uXBw6b9b_0r_%2BJHVPR-WL2P0JuXox90xCJDh%3Dg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Calculating rolling average using aggregations

Clinton Gormley-2
OK, Binh took me aside and explained what you are after: a number per hour which reflects the average of the previous 28*24 hours.

There are two ways you can do this with aggregations.  The first way is to create overlapping buckets, so each date will fall into multiple windows, eg:

    Jan 1, 10 am to Jan 28, 10am
    Jan 1, 11 am to Jan 28, 11am
    Jan 1, 12 am to Jan 28, 12am
    etc

... which can be done by constructing all of the date ranges you need, ie 28 * 24 of them.

The second way to do it is to use a script to convert each date into multiple dates: 
    $date
    $date - 1h
    $date - 2h
    $date - 3h
    ...
    $date - 672h

I've put together a simplified example (rolling avg per day) which demonstrates both techniques.  Unfortunately, the script version showed up a bug in aggregations, which will be fixed in 1.1.0 - currently you can include 4 values, but no more.


clint




On 12 March 2014 16:29, Clinton Gormley <[hidden email]> wrote:
Heya Bihn

The part I'm not getting is this: "the rolling average for every hour in the last 28 days". ie what period should each bucket/rolling avg cover? an hour? 28 days?

You can still do rolling averages with aggregations, but they require a bit more work. I wanted to get the exact specs before trying to answer again :)

clint



On 12 March 2014 16:09, Binh Ly <[hidden email]> wrote:
Unfortunately a MA aggregation is not available at the moment (unless you accumulate/include all the prior data that you need into every doc, or you run multiple queries and aggregate it yourself - neither of which is desirable or flexible)

Clint, a MA is an average computed in a "sliding window" fashion, where the time interval slides as the average is computed. So for example, let's say I have a daily statistic for 5 days:

[Day1 = 100, Day 2 = 105, Day 3 = 101, Day 4 = 120, Day 5 = 200]

A 2 day in-the-past MA would be computed as follows:

Day1 = No data yet (since we don't have 2 days in-the-past worth of info)
Day2 = (Day1 + Day2) / 2 = (100 + 105) / 2
Day3 = (Day2 + Day3) / 2 = (105 + 101) / 2
...




--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/89d909af-377e-44bd-8cec-31249f9f1d60%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPt3XKQ3ZGG9Ak4AWUgsk%3DYMkQez9%3DLWdixzosRbDQS5ouqUiQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Calculating rolling average using aggregations

Clinton Gormley-2
I rethought this problem last night. The solutions I've presented already are a lot less efficient than they could be, as they increase the work per doc by a factor of the number of buckets (ie 24h * 28d = 672).

It'd be much more efficient to calculate this rolling average client side in a single pass over the data, eg:

{
  "aggs": {
    "per_day": {
      "date_histogram": {
        "field": "date",
        "interval": "day",
        "format": "yyyy-MM-dd"
      },
      "aggs": {
        "total": {
          "sum": {
            "field": "num"
          }
        }
      }
    }
  }
}'

This gives you the doc count per interval, plus the total of `num` per interval. Now, it is easy to calculate the rolling average in a single pass across each slot:

1. Add up the doc counts and num totals for all values in the desired window.
2. Calc the average for the window
3. End if there are no more docs
3. Remove the oldest doc from total doc count and total num
4. Add the next doc into total doc count and total num
5. Repeat from step 2


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPt3XKSuE5M3reGFnWBY0EytVJUb2omkn1-XtsD9yrKutXtDBQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Calculating rolling average using aggregations

Michaël Gallego
In reply to this post by Simon Cast
+1 for this feature!

What I need is pretty similar: calculate rolling sum, so for each day, I need to sum the previous 30 days (on each point). Oracle and Postgre make this very easy with aggregation function (and they can take advantage of very interesting optimization for sum, as each point is actually the sum of previous day + sum of elements in the given day). Actually, the simplest way I've found is asking for more data (so if I want rolling sum for 30 days, I ask for 60 days, and do the sum myself). But this is quite inefficient.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b72eea67-d8cb-4ac3-ac4f-4e984183731b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Calculating rolling average using aggregations

evanceheally
This post has NOT been accepted by the mailing list yet.
Hello Clint

Since its 2.5 years gone now, is there any way of doing this moving window average calculation in elasticsearch now?