Best practices for dealing with a large number of small activity stream events

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Best practices for dealing with a large number of small activity stream events

Hi all,
  I'm looking at using elasticsearch for a use case that I'd love some feedback on regarding best practices. 

A little background... I've been digging into various approaches to allowing interactive drill down slicing dicing of activity stream data ( actor / verb / target ) user data for realtime analytics for end users. This is high dimensional data that has too many potential ways to view to effectively precompute rollups. Other systems out there that try to tackle this similar problem that I have played around with are Druid, OpenTSDB, Blueflood, InfluxDB. At the end of the day they either all use an inverted index or have or are planning to have elasticsearch integrations, so I figure why not stick with ES.

There are three areas I am trying to optimize:
- Minimize the index footprint on disk.
- Minimize the RAM footprint
- Maximize the speed

I believe the key tradeoff I need to make with my dataset is going to doc_values and whether or not I try to utilize heap or page cache.  

All my fields are straight exact match not analyzed fields and there are ~15 of them. "not_analyzed" appears to have all the extras that can cause bloat disabled (norms, frequencies, etc). I am not indexing source. Here is my index template:

With some test data, I'm getting pretty solid results. Average messages are ~360bytes and I am getting:
- 60 bytes per without doc_values 
- 80 bytes per with doc values

On a test index with ~160million docs w/o doc values, I have it at 9.6GB of data with the file breakdown like so:
3.8G Jul 23 09:40 _mwf.fdt
3.9G Jul 23 10:32 _mwf_es090_0.tim
1.8G Jul 23 10:32 _mwf_es090_0.doc

Anybody know how I can slim things down any further or general advice when dealing with large numbers of small documents? 


You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit
For more options, visit