Terms Facet with Histograms

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Terms Facet with Histograms

vaidik
Hi,

I have a requirement according to which I need the count of terms but in range buckets. Consider this scenario:

I have inserted the following documents in ES in an index:

{ "timestamp": 120000000, "combination": "1", "goal": "1" }
{ "timestamp": 120000002, "combination": "1", "goal": "2" }
{ "timestamp": 120000003, "combination": "2", "goal": "1" }
{ "timestamp": 120000015, "combination": "1", "goal": "2" }
{ "timestamp": 120000040, "combination": "1", "goal": "3" }
{ "timestamp": 120000043, "combination": "2", "goal": "3" }
{ "timestamp": 120000057, "combination": "1", "goal": "2" }

The following terms facet on the above data:
{
  "query": {
    "match_all": {}
  },
  "facets": {
    "comb_goal": {
      {
        "terms": {
          "script_field": "_source.combination + '-' + _source.goal",
        }
      }
    }
  }
}

would give me the following result:
{
  "terms": [
    {
      "term": "1-1",
      "count": 1,
    },
    {
      "term": "1-2",
      "count": 3,
    },
    {
      "term": "1-3",
      "count": 1,
    },
    {
      "term": "2-1",
      "count": 1,
    },
    {
      "term": "2-3",
      "count": 1,
    },
  ]
}

My exact requirement is to get the above term counts but broken down into buckets based on timestamp intervals. In a way, I want a histogram that does terms counts instead of acting on numeric fields and doing mathematical aggregation on it.

Is this possible at all using facets? The only way I see this possible is by using multiple terms facets as I am using with range queries to handle the timestamp buckets. But then I will have to form the buckets by myself and this will mean a large number of queries if my date range is large. One way to reduce the number of queries would be to use multi-search queries. But isn't there a simpler way?

Looking forward to some valuable inputs on this problem.

Thanks,
Vaidik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Terms Facet with Histograms

dadoonet
I guess you will be able to do it with elasticsearch 1.0 and the new aggregation feature.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 15 nov. 2013 à 10:56, Vaidik Kapoor <[hidden email]> a écrit :

Hi,

I have a requirement according to which I need the count of terms but in range buckets. Consider this scenario:

I have inserted the following documents in ES in an index:

{ "timestamp": 120000000, "combination": "1", "goal": "1" }
{ "timestamp": 120000002, "combination": "1", "goal": "2" }
{ "timestamp": 120000003, "combination": "2", "goal": "1" }
{ "timestamp": 120000015, "combination": "1", "goal": "2" }
{ "timestamp": 120000040, "combination": "1", "goal": "3" }
{ "timestamp": 120000043, "combination": "2", "goal": "3" }
{ "timestamp": 120000057, "combination": "1", "goal": "2" }

The following terms facet on the above data:
{
  "query": {
    "match_all": {}
  },
  "facets": {
    "comb_goal": {
      {
        "terms": {
          "script_field": "_source.combination + '-' + _source.goal",
        }
      }
    }
  }
}

would give me the following result:
{
  "terms": [
    {
      "term": "1-1",
      "count": 1,
    },
    {
      "term": "1-2",
      "count": 3,
    },
    {
      "term": "1-3",
      "count": 1,
    },
    {
      "term": "2-1",
      "count": 1,
    },
    {
      "term": "2-3",
      "count": 1,
    },
  ]
}

My exact requirement is to get the above term counts but broken down into buckets based on timestamp intervals. In a way, I want a histogram that does terms counts instead of acting on numeric fields and doing mathematical aggregation on it.

Is this possible at all using facets? The only way I see this possible is by using multiple terms facets as I am using with range queries to handle the timestamp buckets. But then I will have to form the buckets by myself and this will mean a large number of queries if my date range is large. One way to reduce the number of queries would be to use multi-search queries. But isn't there a simpler way?

Looking forward to some valuable inputs on this problem.

Thanks,
Vaidik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.