This post has NOT been accepted by the mailing list yet.
Currently, we find ourselves wanting to replace our metrics collection & visualization system with something else.
Our present solution is a homegrown, RRDTool-based monitoring solution, which has served us well for many years.
Today, it ingests a steady stream of ~3.5M metrics at 5min “pre-aggregation” intervals.
It is simple and easy to use, and our engineers essentially get it all for free.
RRDTool graphs (we have a simple DSL/cacti-ish app around these) are the lingua franca of our org. People pass the RRD URLs around all day long — typically munging up the URLs as needed. Debugging Incidents quickly.
Plus, we spend almost zero time on care-and-feeding it.
But that system has it’s limitations. Among others, it has limited data retention (fixed-size; hour, day, week, month, quarter, year buckets). Limited data resolution (we want something finer — 10s)
It “loses data” as we move outward in time.
And, as we migrate to an ephemeral provisioning solution (docker/mesos), it’s host orientation doesn’t fit.
We have investigated many of the open source metrics collection & visualization alternatives out there; InfluxDB, Graphite, KairosDB, Cyanite, …
And all of them come up short in one way or another.
Because we are big users/lovers of Elasticsearch.
(We run several large-ish clusters — for a myriad of use-cases — each with billions of documents running at relatively high TPS.)
ES is a natural choice for us, particularly given it’s Aggregations.
Although we are talking about a pretty large amount of data per day; indexing ~30.24B documents/day when using 10s intervals.
Which, in turn, must be accessed/graphed in sub-second time (Typically, engineers need metrics when things have gone south, and every second counts :~)
I am wondering if anyone has any experience they can share w/ me of using ES (at scale) for metrics collection & visualization .
Or better, off-the-shelf open source solutions.
As far as I can see so far, the conventional wisdom is Logstash/Kibana??
But is there any solution out there that is more specifically tailored to this job??