Splunk vs. Elastic search performance?

classic Classic list List threaded Threaded
27 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Splunk vs. Elastic search performance?

Frank Flynn
We have a large Splunk instance.  We load about 1.25 Tb of logs a day.  We have about 1,300 loaders (servers that collect and load logs - they may do other things too).

As I look at Elasticsearch / Logstash / Kibana does anyone know of a performance comparison guide?  Should I expect to run on very similar hardware?  More? or Less?

Sure it depends on exactly what we're doing, the exact queries and the frequency we'd run them but I'm trying to get any kind of idea before we start.

Are there any white papers or other documents about switching?  It seems an obvious choice but I can only find very little performance comparisons (I did see that Elasticsearch just hired "the former VP of Products at Splunk, Gaurav Gupta" - but there were few numbers in that article either).

Thanks,
Frank

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Splunk vs. Elastic search performance?

Mark Walkom
That's a lot of data! I don't know of any installations that big but someone else might.

What sort of infrastructure are you running splunk on now, what's your current and expected retention?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [hidden email]
web: www.campaignmonitor.com


On 19 April 2014 07:33, Frank Flynn <[hidden email]> wrote:
We have a large Splunk instance.  We load about 1.25 Tb of logs a day.  We have about 1,300 loaders (servers that collect and load logs - they may do other things too).

As I look at Elasticsearch / Logstash / Kibana does anyone know of a performance comparison guide?  Should I expect to run on very similar hardware?  More? or Less?

Sure it depends on exactly what we're doing, the exact queries and the frequency we'd run them but I'm trying to get any kind of idea before we start.

Are there any white papers or other documents about switching?  It seems an obvious choice but I can only find very little performance comparisons (I did see that Elasticsearch just hired "the former VP of Products at Splunk, Gaurav Gupta" - but there were few numbers in that article either).

Thanks,
Frank

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ZwX2YACKX_yobDK%2BjXHRdexq2gKQ1iOO7%3DAPPoKkBZmQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Splunk vs. Elastic search performance?

Greg Murnane
In reply to this post by Frank Flynn
I'm running elasticsearch much smaller than this, but with a PowerEdge R900 with 2 X7350 CPUs, and 64 GB of RAM (24GB heap for elasticsearch) I'm able to sustain something like 80GB per day (1/16 your volume). Some of the latest Intel CPUs are about 4 times as powerful as the X7350, so extrapolating from my results, with very new hardware you can probably do 1.25TB per day on around 5 nodes with 2 CPUs, 256GB RAM, and 8 disks each. I haven't had an opportunity to test this yet, and even if this is possible, you should probably get have more nodes than this; hardware failure, growth, or a sudden increase in logging volume from a problem can take down a cluster that's running at full capacity all the time.

I'd encourage you to put elasticsearch on some of your systems to generate some benchmarks. I've never tried clustering elasticsearch with more than 5 hosts. At 1300 systems, each would be doing around 15 KB/s, which is essentially trivial. You might try taking splunk off 2 dozen systems or so, and committing them to elasticsearch, then see how well they keep up with the load you're generating. Data from your particular setup will almost always be the best sort to have.

The information transmitted in this email is intended only for the person(s) or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this email in error, please contact the sender and permanently delete the email from any computer.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d465a805-0ada-4398-b4d8-f8ab56e4f34b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Splunk vs. Elastic search performance?

Jaguar

We have a cluster with 10 nodes, 48g heap for each ES process. The total indexing rate is about 25000 doc per second, about 20 indices actively receiving new data. I'm really courious to compare and evaluate the indexing performance numers.

Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP0hgQ34ZwY6Or0PUFZn_Ciu_iyZZJjyXfz%3DNBu64Ge9uN3hxQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Splunk vs. Elastic search performance?

Clinton Gormley-2
Goldman Sachs gave a talk about how they're using Elasticsearch to index 5TB of log data per day. I can't find the video of the talk, but from a blogpost about it:

Next was Indy Tharmakumar from our hosts Goldman Sachs, showing how his team have built powerful support systems using ElasticSearch to index log data. Using 32 1 core CPU instances the system they have built can store 1.2 billion log lines with a throughput up to 40,000 messages a second (the systems monitored produce 5TB of log data every day). Log data is queued up in Redis, distributed to many Logstash processes, indexed by Elasticsearch with a Kibana front end. They learned that Logstash can be particularly CPU intensive but Elasticsearch itself scales extremely well. Future plans include considering Apache Kafka as a data backbone.


On 19 April 2014 06:46, 熊贻青 <[hidden email]> wrote:

We have a cluster with 10 nodes, 48g heap for each ES process. The total indexing rate is about 25000 doc per second, about 20 indices actively receiving new data. I'm really courious to compare and evaluate the indexing performance numers.

Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP0hgQ34ZwY6Or0PUFZn_Ciu_iyZZJjyXfz%3DNBu64Ge9uN3hxQ%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPt3XKS6AnOVKwNszU-SFYmyGUpk57U_kz8iXQssGk%3DX81KMiA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Splunk vs. Elastic search performance?

Frank Flynn
In reply to this post by Frank Flynn
thanks for the tips so far.  I should have been a bit more specific.  It's Saturday toady and I'm doing this off the top of my head so I might be off by a bit but as I recall in Splunk right now we have the equivalent to 11 indexes - the biggest one runs 4Gb a day, all together they run 1.2Tb a day.  We retain the data for 90 days.  We have 12 machines indexing the data in EC2 (m2.4xlarge) and although it works fine it is too slow (users complain about report speed).  

If EC works the money I can save from not renewing my spunk license could easily double the number of servers and upgrade them to the i class (SSD storage with big ram) and send the team to Europe for a couple of weeks (although the trip to Europe is not my decision).  

I will look for the Goldman Sachs talk.  My plan after reading the ES website is to leave Splunk alone, fork the data for one index to a new ES cluster and Splunk then make the comparisons.  My only issue is if I go with the i instances (with SSD's) it's not a fair comparison for benchmarking.  That may not be a big deal for me but I'd love to see the Apples to Apples numbers.

Frank

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6dafa0bb-3616-476e-9409-0fed8b47dd86%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Splunk vs. Elastic search performance?

Sabareesh SS
In reply to this post by Frank Flynn
What are the different ways I can make a good use of Elasticsearch?

On Saturday, April 19, 2014 3:03:59 AM UTC+5:30, Frank Flynn wrote:
We have a large Splunk instance.  We load about 1.25 Tb of logs a day.  We have about 1,300 loaders (servers that collect and load logs - they may do other things too).

As I look at Elasticsearch / Logstash / Kibana does anyone know of a performance comparison guide?  Should I expect to run on very similar hardware?  More? or Less?

Sure it depends on exactly what we're doing, the exact queries and the frequency we'd run them but I'm trying to get any kind of idea before we start.

Are there any white papers or other documents about switching?  It seems an obvious choice but I can only find very little performance comparisons (I did see that Elasticsearch just hired "the former VP of Products at Splunk, Gaurav Gupta" - but there were few numbers in that article either).

Thanks,
Frank

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4b59aaf2-c64f-4299-a066-7533aafac97f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Splunk vs. Elastic search performance?

Thomas Paulsen
In reply to this post by Mark Walkom
We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12 Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned. The system is slow but ok to use. 

We tried Elasticsearch and we were able to get the same performance with the same amount of machines. Unfortunately with Elasticsearch you need almost double amount of storage, plus a LOT of patience to make is run. It took us six months to set it up properly, and even now, the system is quite buggy and instable and from time to time we loose data with Elasticsearch. 

I don´t recommend ELK for a critical production system, for just dev work, it is ok, if you don´t mind the hassle of setting up and operating it. The costs you save by not buying a splunk license you have to invest into consultants to get it up and running. Our dev teams hate Elasticsearch and prefer Splunk.

Am Samstag, 19. April 2014 00:07:44 UTC+2 schrieb Mark Walkom:
That's a lot of data! I don't know of any installations that big but someone else might.

What sort of infrastructure are you running splunk on now, what's your current and expected retention?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: <a href="javascript:" target="_blank" gdf-obfuscated-mailto="B1WF9SbZmzYJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">ma...@...
web: <a href="http://www.campaignmonitor.com" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.campaignmonitor.com\46sa\75D\46sntz\0751\46usg\75AFQjCNFv30c-WBiP6sfBmxXaWBP5YBZg1Q';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.campaignmonitor.com\46sa\75D\46sntz\0751\46usg\75AFQjCNFv30c-WBiP6sfBmxXaWBP5YBZg1Q';return true;">www.campaignmonitor.com


On 19 April 2014 07:33, Frank Flynn <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="B1WF9SbZmzYJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">faultle...@...> wrote:
We have a large Splunk instance.  We load about 1.25 Tb of logs a day.  We have about 1,300 loaders (servers that collect and load logs - they may do other things too).

As I look at Elasticsearch / Logstash / Kibana does anyone know of a performance comparison guide?  Should I expect to run on very similar hardware?  More? or Less?

Sure it depends on exactly what we're doing, the exact queries and the frequency we'd run them but I'm trying to get any kind of idea before we start.

Are there any white papers or other documents about switching?  It seems an obvious choice but I can only find very little performance comparisons (I did see that Elasticsearch just hired "the former VP of Products at Splunk, Gaurav Gupta" - but there were few numbers in that article either).

Thanks,
Frank

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="B1WF9SbZmzYJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/32c23e38-2a2f-4c09-a76d-6a824edb1b85%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Splunk vs. Elastic search performance?

Mark Walkom
I'd be interested in knowing what problems you had with ELK, if you don't mind sharing.

I understand the ease of splunk, but ELK isn't that difficult if you have some in-house linux skills.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [hidden email]
web: www.campaignmonitor.com


On 19 June 2014 22:48, Thomas Paulsen <[hidden email]> wrote:
We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12 Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned. The system is slow but ok to use. 

We tried Elasticsearch and we were able to get the same performance with the same amount of machines. Unfortunately with Elasticsearch you need almost double amount of storage, plus a LOT of patience to make is run. It took us six months to set it up properly, and even now, the system is quite buggy and instable and from time to time we loose data with Elasticsearch. 

I don´t recommend ELK for a critical production system, for just dev work, it is ok, if you don´t mind the hassle of setting up and operating it. The costs you save by not buying a splunk license you have to invest into consultants to get it up and running. Our dev teams hate Elasticsearch and prefer Splunk.

Am Samstag, 19. April 2014 00:07:44 UTC+2 schrieb Mark Walkom:
That's a lot of data! I don't know of any installations that big but someone else might.

What sort of infrastructure are you running splunk on now, what's your current and expected retention?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [hidden email]
web: www.campaignmonitor.com


On 19 April 2014 07:33, Frank Flynn <[hidden email]> wrote:
We have a large Splunk instance.  We load about 1.25 Tb of logs a day.  We have about 1,300 loaders (servers that collect and load logs - they may do other things too).

As I look at Elasticsearch / Logstash / Kibana does anyone know of a performance comparison guide?  Should I expect to run on very similar hardware?  More? or Less?

Sure it depends on exactly what we're doing, the exact queries and the frequency we'd run them but I'm trying to get any kind of idea before we start.

Are there any white papers or other documents about switching?  It seems an obvious choice but I can only find very little performance comparisons (I did see that Elasticsearch just hired "the former VP of Products at Splunk, Gaurav Gupta" - but there were few numbers in that article either).

Thanks,
Frank

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/32c23e38-2a2f-4c09-a76d-6a824edb1b85%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624Y1eem-s6hD3QLfnKHJdZS2p5jtwO%2ByyMbqbcYDrroH1Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Splunk vs. Elastic search performance?

joergprante@gmail.com
In reply to this post by Thomas Paulsen
It is correct you noted that Elasticsearch comes with developer settings - that is exactly what a packages ES is meant for.

If you find issues when configuring and setting up ES for critical use, it would be nice to post your issues so others can also find help too, and maybe share their solutions , because there are ES installations that run successfully in critical environments.

By just quoting "hate" of dev teams, it is rather impossible for me to learn about the reason why this is so. Learning facts is more important than emotions to fix software issues. The power of open source is that such issues can be fixed by the help of a public discussion in the community. In closed software products, you can not rely on issues being discussed publicly for best solutions how to fix them.

Jörg



On Thu, Jun 19, 2014 at 2:48 PM, Thomas Paulsen <[hidden email]> wrote:
We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12 Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned. The system is slow but ok to use. 

We tried Elasticsearch and we were able to get the same performance with the same amount of machines. Unfortunately with Elasticsearch you need almost double amount of storage, plus a LOT of patience to make is run. It took us six months to set it up properly, and even now, the system is quite buggy and instable and from time to time we loose data with Elasticsearch. 

I don´t recommend ELK for a critical production system, for just dev work, it is ok, if you don´t mind the hassle of setting up and operating it. The costs you save by not buying a splunk license you have to invest into consultants to get it up and running. Our dev teams hate Elasticsearch and prefer Splunk.

Am Samstag, 19. April 2014 00:07:44 UTC+2 schrieb Mark Walkom:
That's a lot of data! I don't know of any installations that big but someone else might.

What sort of infrastructure are you running splunk on now, what's your current and expected retention?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [hidden email]
web: www.campaignmonitor.com


On 19 April 2014 07:33, Frank Flynn <[hidden email]> wrote:
We have a large Splunk instance.  We load about 1.25 Tb of logs a day.  We have about 1,300 loaders (servers that collect and load logs - they may do other things too).

As I look at Elasticsearch / Logstash / Kibana does anyone know of a performance comparison guide?  Should I expect to run on very similar hardware?  More? or Less?

Sure it depends on exactly what we're doing, the exact queries and the frequency we'd run them but I'm trying to get any kind of idea before we start.

Are there any white papers or other documents about switching?  It seems an obvious choice but I can only find very little performance comparisons (I did see that Elasticsearch just hired "the former VP of Products at Splunk, Gaurav Gupta" - but there were few numbers in that article either).

Thanks,
Frank

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/32c23e38-2a2f-4c09-a76d-6a824edb1b85%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGtte%3DRWjZCNtBWcX5y4Z9j7yXpyXC5MWdzpqubtCce5Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Splunk vs. Elastic search performance?

InquiringMind
In reply to this post by Thomas Paulsen
Thomas,

Thanks for your insights and experiences. As I am someone who has explored and used ES for over a year but is relatively new to the ELK stack, your data points are extremely valuable. Let me offer some of my own views.

Re: double the storage. I strongly recommend ELK users to disable the _all field. The entire text of the log events generated by logstash ends up in the message field (and not @message as many people incorrectly post). So the _all field is just redundant overhead with no value add. The result is a dramatic drop in database file sizes and dramatic increase in load performance. Of course, you need to configure ES to use the message field as the default for a Lucene Kibana query.

During the year that I've used ES and watched this group, I have been on the front line of a brand new product with a smart and dedicated development team working steadily to improve the product. Six months ago, the ELK stack eluded me and reports weren't encouraging (with the sole exception of the Kibana web site's marketing pitch). But ES has come a long way since six months ago, and the ELK stack is much more closely integrated.

The Splunk UI is carefully crafted to isolate users from each other and prevent external (to the Splunk db itself, not to our company) users from causing harm to data. But Kibana seems to be meant for a small cadre of trusted users. What if I write a dashboard with the same name as someone else's? Kibana doesn't even begin to discuss user isolation. But I am confident that it will.

How can I tell Kibana to set the default Lucene query operator to AND instead of OR. Google is not my friend: I keep getting references to the Ruby versions of Kibana; that's ancient history by now. Kibana is cool and promising, but it has a long way to go for deployment to all of the folks in our company who currently have access to Splunk.

Logstash has a nice book that's been very helpful, and logstash itself has been an excellent tool for prototyping. The book has been invaluable in helping me extract dates from log events and handling all of our different multiline events. But it still doesn't explain why the date filter needs a different array of matching strings to get the date that the grok filter has already matched and isolated. And recommendations to avoid the elasticsearch_http output and use elasticsearch (via the Node client) directly contradict the fact that logstash's 1.1.1 version of the ES client library is not compatible with the most recent 1.2.1 version of ES.

And logstash is also a resource hog, so we eventually plan to replace it with Perl and Apache Flume (already in use) and pipe it into my Java bulk load tool (which is always kept up-to-date with the versions of ES we deploy!!). Because we send the data via Flume to our data warehouse, any losses in ES will be annoying but won't be catastrophic. And the front-end following of rotated log files will be done using the GNU tail -F command and option. This GNU tail command with its uppercase -F option follows rotated log files perfectly. I doubt that logstash can do the same, and we currently see that neither can Splunk (so we sporadically lose log events in Splunk too). So GNU tail -F piped into logstash with the stdin filter works perfectly in my evaluation setup and will likely form the first stage of any log forwarder we end up deploying,

Brian

On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote:
We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12 Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned. The system is slow but ok to use. 

We tried Elasticsearch and we were able to get the same performance with the same amount of machines. Unfortunately with Elasticsearch you need almost double amount of storage, plus a LOT of patience to make is run. It took us six months to set it up properly, and even now, the system is quite buggy and instable and from time to time we loose data with Elasticsearch. 

I don´t recommend ELK for a critical production system, for just dev work, it is ok, if you don´t mind the hassle of setting up and operating it. The costs you save by not buying a splunk license you have to invest into consultants to get it up and running. Our dev teams hate Elasticsearch and prefer Splunk.

On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote:
We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12 Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned. The system is slow but ok to use. 

We tried Elasticsearch and we were able to get the same performance with the same amount of machines. Unfortunately with Elasticsearch you need almost double amount of storage, plus a LOT of patience to make is run. It took us six months to set it up properly, and even now, the system is quite buggy and instable and from time to time we loose data with Elasticsearch. 

I don´t recommend ELK for a critical production system, for just dev work, it is ok, if you don´t mind the hassle of setting up and operating it. The costs you save by not buying a splunk license you have to invest into consultants to get it up and running. Our dev teams hate Elasticsearch and prefer Splunk.

Am Samstag, 19. April 2014 00:07:44 UTC+2 schrieb Mark Walkom:
That's a lot of data! I don't know of any installations that big but someone else might.

What sort of infrastructure are you running splunk on now, what's your current and expected retention?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [hidden email]
web: <a href="http://www.campaignmonitor.com" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.campaignmonitor.com\46sa\75D\46sntz\0751\46usg\75AFQjCNFv30c-WBiP6sfBmxXaWBP5YBZg1Q';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.campaignmonitor.com\46sa\75D\46sntz\0751\46usg\75AFQjCNFv30c-WBiP6sfBmxXaWBP5YBZg1Q';return true;">www.campaignmonitor.com


On 19 April 2014 07:33, Frank Flynn <[hidden email]> wrote:
We have a large Splunk instance.  We load about 1.25 Tb of logs a day.  We have about 1,300 loaders (servers that collect and load logs - they may do other things too).

As I look at Elasticsearch / Logstash / Kibana does anyone know of a performance comparison guide?  Should I expect to run on very similar hardware?  More? or Less?

Sure it depends on exactly what we're doing, the exact queries and the frequency we'd run them but I'm trying to get any kind of idea before we start.

Are there any white papers or other documents about switching?  It seems an obvious choice but I can only find very little performance comparisons (I did see that Elasticsearch just hired "the former VP of Products at Splunk, Gaurav Gupta" - but there were few numbers in that article either).

Thanks,
Frank

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6441b278-39ad-417d-98a6-d6e131895634%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

get rid of _all to optimize storage and perfs (Re: Splunk vs. Elastic search performance?)

Patrick Proniewski
On 20 juin 2014, at 18:43, Brian wrote:

> Re: double the storage. I strongly recommend ELK users to disable the _all field. The entire text of the log events generated by logstash ends up in the message field (and not @message as many people incorrectly post). So the _all field is just redundant overhead with no value add. The result is a dramatic drop in database file sizes and dramatic increase in load performance. Of course, you need to configure ES to use the message field as the default for a Lucene Kibana query.


"message" field can be edited during logstash filtering, but admitting it's enough, I would love to remove "_all" field and point Kibana to "message". Oddly, I can't find the "_all" field, neither in Sense, nor in Kibana. I know it's enabled:

GET _template/logstash

{
   "logstash": {
      "order": 0,
      "template": "logstash-*",
      "settings": {
         "index.refresh_interval": "5s"
      },
      "mappings": {
         "_default_": {
            "dynamic_templates": [
               {
                  "string_fields": {
                     "mapping": {
                        "index": "analyzed",
                        "omit_norms": true,
                        "type": "string",
                        "fields": {
                           "raw": {
                              "index": "not_analyzed",
                              "ignore_above": 256,
                              "type": "string"
                           }
                        }
                     },
                     "match_mapping_type": "string",
                     "match": "*"
                  }
               }
            ],
            "properties": {
               "geoip": {
                  "dynamic": true,
                  "path": "full",
                  "properties": {
                     "location": {
                        "type": "geo_point"
                     }
                  },
                  "type": "object"
               },
               "@version": {
                  "index": "not_analyzed",
                  "type": "string"
               }
            },
            "_all": {
               "enabled": true    <------
            }
         }
      },
      "aliases": {}
   }
}

But it looks like I cant retrieve/display its content. Any idea?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/DA2C93C0-709E-4DAA-96A3-F6AB4588FF6A%40patpro.net.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: get rid of _all to optimize storage and perfs (Re: Splunk vs. Elastic search performance?)

InquiringMind
Patrick,

Here's my template, along with where the _all field is disabled. You may wish to add this setting to your own template, and then also add the index setting to ignore malformed data (if someone's log entry occasionally slips in "null" or "no-data" instead of the usual numeric value):

{
  "automap" : {
    "template" : "logstash-*",
    "settings" : {
      "index.mapping.ignore_malformed" : true
    },
    "mappings" : {
      "_default_" : {
        "numeric_detection" : true,
        "_all" : { "enabled" : false },
        "properties" : {
          "message" : { "type" : "string" },
          "host" : { "type" : "string" },
          "UUID" : {  "type" : "string", "index" : "not_analyzed" },
          "logdate" : {  "type" : "string", "index" : "no" }
        }
      }
    }
  }
}

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a145cb1e-4013-4a6b-a58d-9a42368d8107%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Splunk vs. Elastic search performance?

Mark Walkom
In reply to this post by InquiringMind
I wasn't aware that the elasticsearch_http output wasn't recommended?
When I spoke to a few of the ELK devs a few months ago, they indicated that there was minimal performance difference, at the greater benefit of not being locked to specific LS+ES versioning.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [hidden email]
web: www.campaignmonitor.com


On 21 June 2014 02:43, Brian <[hidden email]> wrote:
Thomas,

Thanks for your insights and experiences. As I am someone who has explored and used ES for over a year but is relatively new to the ELK stack, your data points are extremely valuable. Let me offer some of my own views.

Re: double the storage. I strongly recommend ELK users to disable the _all field. The entire text of the log events generated by logstash ends up in the message field (and not @message as many people incorrectly post). So the _all field is just redundant overhead with no value add. The result is a dramatic drop in database file sizes and dramatic increase in load performance. Of course, you need to configure ES to use the message field as the default for a Lucene Kibana query.

During the year that I've used ES and watched this group, I have been on the front line of a brand new product with a smart and dedicated development team working steadily to improve the product. Six months ago, the ELK stack eluded me and reports weren't encouraging (with the sole exception of the Kibana web site's marketing pitch). But ES has come a long way since six months ago, and the ELK stack is much more closely integrated.

The Splunk UI is carefully crafted to isolate users from each other and prevent external (to the Splunk db itself, not to our company) users from causing harm to data. But Kibana seems to be meant for a small cadre of trusted users. What if I write a dashboard with the same name as someone else's? Kibana doesn't even begin to discuss user isolation. But I am confident that it will.

How can I tell Kibana to set the default Lucene query operator to AND instead of OR. Google is not my friend: I keep getting references to the Ruby versions of Kibana; that's ancient history by now. Kibana is cool and promising, but it has a long way to go for deployment to all of the folks in our company who currently have access to Splunk.

Logstash has a nice book that's been very helpful, and logstash itself has been an excellent tool for prototyping. The book has been invaluable in helping me extract dates from log events and handling all of our different multiline events. But it still doesn't explain why the date filter needs a different array of matching strings to get the date that the grok filter has already matched and isolated. And recommendations to avoid the elasticsearch_http output and use elasticsearch (via the Node client) directly contradict the fact that logstash's 1.1.1 version of the ES client library is not compatible with the most recent 1.2.1 version of ES.

And logstash is also a resource hog, so we eventually plan to replace it with Perl and Apache Flume (already in use) and pipe it into my Java bulk load tool (which is always kept up-to-date with the versions of ES we deploy!!). Because we send the data via Flume to our data warehouse, any losses in ES will be annoying but won't be catastrophic. And the front-end following of rotated log files will be done using the GNU tail -F command and option. This GNU tail command with its uppercase -F option follows rotated log files perfectly. I doubt that logstash can do the same, and we currently see that neither can Splunk (so we sporadically lose log events in Splunk too). So GNU tail -F piped into logstash with the stdin filter works perfectly in my evaluation setup and will likely form the first stage of any log forwarder we end up deploying,

Brian

On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote:
We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12 Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned. The system is slow but ok to use. 

We tried Elasticsearch and we were able to get the same performance with the same amount of machines. Unfortunately with Elasticsearch you need almost double amount of storage, plus a LOT of patience to make is run. It took us six months to set it up properly, and even now, the system is quite buggy and instable and from time to time we loose data with Elasticsearch. 

I don´t recommend ELK for a critical production system, for just dev work, it is ok, if you don´t mind the hassle of setting up and operating it. The costs you save by not buying a splunk license you have to invest into consultants to get it up and running. Our dev teams hate Elasticsearch and prefer Splunk.

On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote:
We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12 Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned. The system is slow but ok to use. 

We tried Elasticsearch and we were able to get the same performance with the same amount of machines. Unfortunately with Elasticsearch you need almost double amount of storage, plus a LOT of patience to make is run. It took us six months to set it up properly, and even now, the system is quite buggy and instable and from time to time we loose data with Elasticsearch. 

I don´t recommend ELK for a critical production system, for just dev work, it is ok, if you don´t mind the hassle of setting up and operating it. The costs you save by not buying a splunk license you have to invest into consultants to get it up and running. Our dev teams hate Elasticsearch and prefer Splunk.

Am Samstag, 19. April 2014 00:07:44 UTC+2 schrieb Mark Walkom:
That's a lot of data! I don't know of any installations that big but someone else might.

What sort of infrastructure are you running splunk on now, what's your current and expected retention?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [hidden email]
web: www.campaignmonitor.com


On 19 April 2014 07:33, Frank Flynn <[hidden email]> wrote:
We have a large Splunk instance.  We load about 1.25 Tb of logs a day.  We have about 1,300 loaders (servers that collect and load logs - they may do other things too).

As I look at Elasticsearch / Logstash / Kibana does anyone know of a performance comparison guide?  Should I expect to run on very similar hardware?  More? or Less?

Sure it depends on exactly what we're doing, the exact queries and the frequency we'd run them but I'm trying to get any kind of idea before we start.

Are there any white papers or other documents about switching?  It seems an obvious choice but I can only find very little performance comparisons (I did see that Elasticsearch just hired "the former VP of Products at Splunk, Gaurav Gupta" - but there were few numbers in that article either).

Thanks,
Frank

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6441b278-39ad-417d-98a6-d6e131895634%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ZPUksz0DdYMPrTrN0D21PqSdbZrEozGsG8srjom3CvSQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Splunk vs. Elastic search performance?

InquiringMind
Mark,

I've read one post (can't remember where) that the Node client was preferred, but have also read where the HTTP interface is minimal overhead. So yes, I am currently using logstash with the HTTP interface and it works fine.

I also performed some experiments with clustering (not much, due to resource and time constraints) and used unicast discovery. Then I read someone who strongly recommended multicast recovery, and I started to feel like I'd gone down the wrong path. Then I watched the ELK webinar and heard that unicast discovery was preferred. I think it's not a big deal either way; it's what works best for your particular networking infrastructure.

In addition, I was recently given this link: http://aphyr.com/posts/317-call-me-maybe-elasticsearch. It hasn't dissuaded me at all, but it is a thought-provoking read. I am a little confused by some things, though. In all of my high-performance banging on ES, even with my time-to-live test feature enabled, I never lost any documents at all. But I wasn't using auto-id; I was specifying my own unique ID. And when run in my 3-node cluster (slow due to being hosted by 3 VMs running on a dual-code machine), I still didn't lose any data. So I am not sure of the high data loss scenarios he describes in his missive; I have seen no evidence of any data loss due to false insert positives at all.

Brian

On Friday, June 20, 2014 6:30:27 PM UTC-4, Mark Walkom wrote:
I wasn't aware that the elasticsearch_http output wasn't recommended?
When I spoke to a few of the ELK devs a few months ago, they indicated that there was minimal performance difference, at the greater benefit of not being locked to specific LS+ES versioning.

Regards,
Mark Walkom

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f7621a17-9366-4166-9612-61415938013f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Splunk vs. Elastic search performance?

Itamar Syn-Hershko
The data loss scenarios in Aphyr's post are easily generated because his tools stress test the database systems he's testing to the limit, he's practically provoking the DBs he tests to fail (tho they shouldn't really).

In normal operations you normally should not see failures, but what Aphyr showed is that when failure conditions happen the chances you will are pretty high. Thanks to the Fallacies of Distributed Computing, that basically means those are bound to happen every now and then. If and how much data you lose will vary based on volumes, setups etc.

HTH

--

Itamar Syn-Hershko
http://code972.com | @synhershko
Freelance Developer & Consultant


On Sat, Jun 21, 2014 at 2:56 AM, Brian <[hidden email]> wrote:
Mark,

I've read one post (can't remember where) that the Node client was preferred, but have also read where the HTTP interface is minimal overhead. So yes, I am currently using logstash with the HTTP interface and it works fine.

I also performed some experiments with clustering (not much, due to resource and time constraints) and used unicast discovery. Then I read someone who strongly recommended multicast recovery, and I started to feel like I'd gone down the wrong path. Then I watched the ELK webinar and heard that unicast discovery was preferred. I think it's not a big deal either way; it's what works best for your particular networking infrastructure.

In addition, I was recently given this link: http://aphyr.com/posts/317-call-me-maybe-elasticsearch. It hasn't dissuaded me at all, but it is a thought-provoking read. I am a little confused by some things, though. In all of my high-performance banging on ES, even with my time-to-live test feature enabled, I never lost any documents at all. But I wasn't using auto-id; I was specifying my own unique ID. And when run in my 3-node cluster (slow due to being hosted by 3 VMs running on a dual-code machine), I still didn't lose any data. So I am not sure of the high data loss scenarios he describes in his missive; I have seen no evidence of any data loss due to false insert positives at all.

Brian


On Friday, June 20, 2014 6:30:27 PM UTC-4, Mark Walkom wrote:
I wasn't aware that the elasticsearch_http output wasn't recommended?
When I spoke to a few of the ELK devs a few months ago, they indicated that there was minimal performance difference, at the greater benefit of not being locked to specific LS+ES versioning.

Regards,
Mark Walkom

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f7621a17-9366-4166-9612-61415938013f%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zt%3Dhfog0zL2dp5y0Bs9R4foZ4wfzEOkOL%2B-WtAENMaBew%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Splunk vs. Elastic search performance?

Ivan Brusic
In reply to this post by Mark Walkom
I agree. I thought elasticsearch_http was actually the recommended route. Also, I have seen no reported issues with different client/server versions since 1.0. My current logstash setup (which is not production level, simply a dev logging tool) uses Elasticsearch 1.2.1 with Logstash 1.4.1 using the non http interface.

-- 
Ivan


On Fri, Jun 20, 2014 at 3:29 PM, Mark Walkom <[hidden email]> wrote:
I wasn't aware that the elasticsearch_http output wasn't recommended?
When I spoke to a few of the ELK devs a few months ago, they indicated that there was minimal performance difference, at the greater benefit of not being locked to specific LS+ES versioning.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [hidden email]
web: www.campaignmonitor.com


On 21 June 2014 02:43, Brian <[hidden email]> wrote:
Thomas,

Thanks for your insights and experiences. As I am someone who has explored and used ES for over a year but is relatively new to the ELK stack, your data points are extremely valuable. Let me offer some of my own views.

Re: double the storage. I strongly recommend ELK users to disable the _all field. The entire text of the log events generated by logstash ends up in the message field (and not @message as many people incorrectly post). So the _all field is just redundant overhead with no value add. The result is a dramatic drop in database file sizes and dramatic increase in load performance. Of course, you need to configure ES to use the message field as the default for a Lucene Kibana query.

During the year that I've used ES and watched this group, I have been on the front line of a brand new product with a smart and dedicated development team working steadily to improve the product. Six months ago, the ELK stack eluded me and reports weren't encouraging (with the sole exception of the Kibana web site's marketing pitch). But ES has come a long way since six months ago, and the ELK stack is much more closely integrated.

The Splunk UI is carefully crafted to isolate users from each other and prevent external (to the Splunk db itself, not to our company) users from causing harm to data. But Kibana seems to be meant for a small cadre of trusted users. What if I write a dashboard with the same name as someone else's? Kibana doesn't even begin to discuss user isolation. But I am confident that it will.

How can I tell Kibana to set the default Lucene query operator to AND instead of OR. Google is not my friend: I keep getting references to the Ruby versions of Kibana; that's ancient history by now. Kibana is cool and promising, but it has a long way to go for deployment to all of the folks in our company who currently have access to Splunk.

Logstash has a nice book that's been very helpful, and logstash itself has been an excellent tool for prototyping. The book has been invaluable in helping me extract dates from log events and handling all of our different multiline events. But it still doesn't explain why the date filter needs a different array of matching strings to get the date that the grok filter has already matched and isolated. And recommendations to avoid the elasticsearch_http output and use elasticsearch (via the Node client) directly contradict the fact that logstash's 1.1.1 version of the ES client library is not compatible with the most recent 1.2.1 version of ES.

And logstash is also a resource hog, so we eventually plan to replace it with Perl and Apache Flume (already in use) and pipe it into my Java bulk load tool (which is always kept up-to-date with the versions of ES we deploy!!). Because we send the data via Flume to our data warehouse, any losses in ES will be annoying but won't be catastrophic. And the front-end following of rotated log files will be done using the GNU tail -F command and option. This GNU tail command with its uppercase -F option follows rotated log files perfectly. I doubt that logstash can do the same, and we currently see that neither can Splunk (so we sporadically lose log events in Splunk too). So GNU tail -F piped into logstash with the stdin filter works perfectly in my evaluation setup and will likely form the first stage of any log forwarder we end up deploying,

Brian

On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote:
We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12 Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned. The system is slow but ok to use. 

We tried Elasticsearch and we were able to get the same performance with the same amount of machines. Unfortunately with Elasticsearch you need almost double amount of storage, plus a LOT of patience to make is run. It took us six months to set it up properly, and even now, the system is quite buggy and instable and from time to time we loose data with Elasticsearch. 

I don´t recommend ELK for a critical production system, for just dev work, it is ok, if you don´t mind the hassle of setting up and operating it. The costs you save by not buying a splunk license you have to invest into consultants to get it up and running. Our dev teams hate Elasticsearch and prefer Splunk.

On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote:
We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12 Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned. The system is slow but ok to use. 

We tried Elasticsearch and we were able to get the same performance with the same amount of machines. Unfortunately with Elasticsearch you need almost double amount of storage, plus a LOT of patience to make is run. It took us six months to set it up properly, and even now, the system is quite buggy and instable and from time to time we loose data with Elasticsearch. 

I don´t recommend ELK for a critical production system, for just dev work, it is ok, if you don´t mind the hassle of setting up and operating it. The costs you save by not buying a splunk license you have to invest into consultants to get it up and running. Our dev teams hate Elasticsearch and prefer Splunk.

Am Samstag, 19. April 2014 00:07:44 UTC+2 schrieb Mark Walkom:
That's a lot of data! I don't know of any installations that big but someone else might.

What sort of infrastructure are you running splunk on now, what's your current and expected retention?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [hidden email]
web: www.campaignmonitor.com


On 19 April 2014 07:33, Frank Flynn <[hidden email]> wrote:
We have a large Splunk instance.  We load about 1.25 Tb of logs a day.  We have about 1,300 loaders (servers that collect and load logs - they may do other things too).

As I look at Elasticsearch / Logstash / Kibana does anyone know of a performance comparison guide?  Should I expect to run on very similar hardware?  More? or Less?

Sure it depends on exactly what we're doing, the exact queries and the frequency we'd run them but I'm trying to get any kind of idea before we start.

Are there any white papers or other documents about switching?  It seems an obvious choice but I can only find very little performance comparisons (I did see that Elasticsearch just hired "the former VP of Products at Splunk, Gaurav Gupta" - but there were few numbers in that article either).

Thanks,
Frank

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6441b278-39ad-417d-98a6-d6e131895634%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ZPUksz0DdYMPrTrN0D21PqSdbZrEozGsG8srjom3CvSQ%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCR1iuW-CF0XWZ1cexuYP4Ttfp%3DCaCyxngNA_zWAK6OHQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: get rid of _all to optimize storage and perfs (Re: Splunk vs. Elastic search performance?)

Patrick Proniewski
In reply to this post by InquiringMind
Brian,

Thank you for the reply, even if it does not answer my question.

By the way, how am I supposed to change a mapping setting? Do I have to push back the entire mapping with one line modified, or can I just push something like:

{
  "logstash": {
     "mappings": {
        "_default_": {
           "_all": {
              "enabled": false
           }
        }
     }
  }
}



On 20 juin 2014, at 23:04, Brian wrote:

> Patrick,
>
> Here's my template, along with where the _all field is disabled. You may wish to add this setting to your own template, and then also add the index setting to ignore malformed data (if someone's log entry occasionally slips in "null" or "no-data" instead of the usual numeric value):
>
> {
>   "automap" : {
>     "template" : "logstash-*",
>     "settings" : {
>       "index.mapping.ignore_malformed" : true
>     },
>     "mappings" : {
>       "_default_" : {
>         "numeric_detection" : true,
>         "_all" : { "enabled" : false },
>         "properties" : {
>           "message" : { "type" : "string" },
>           "host" : { "type" : "string" },
>           "UUID" : {  "type" : "string", "index" : "not_analyzed" },
>           "logdate" : {  "type" : "string", "index" : "no" }
>         }
>       }
>     }
>   }
> }
>
> Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8D497ED9-54DF-48EA-AA91-44A621B72287%40patpro.net.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: get rid of _all to optimize storage and perfs (Re: Splunk vs. Elastic search performance?)

InquiringMind
Patrick,

Well, I did answer your question. But probably not from the direction you expected.

When I create and manage specific indices, I lock down Elasticsearch. When I update the mappings, I understand that ES will not allow the mapping for an existing field to be modified in an incompatible way. So I only update to add new fields, and never to change or remove an existing field.

For time-based indices as used by the ELK stack, it makes the most sense to me to create an on-disk mapping template. So I always disable the all field and pre-map a subset of string fields as shown in my previous post. I do this because when the next day arrives and logstash causes a new index to be created, that new index will also set my default mapping from the template.

I don't disable the _all field in an existing index that currently has it enabled. I don't know if it would succeed or fail, but I would not expect it to be successful.

Instead, based on my previous experience with ES, I disable the _all field and have disabled it from the very first test deployment of the ELK stack in our group. And then I configured my ES startup script to set message as the default field for a Lucene query. This was already set up and working when I let others have access to it for the very first time. So I don't know the answer to your specific question.

But I do know that a lot of experimentation went into my ELK configurations before I let anyone else look at it for the very first time. So don't be afraid to change your mappings and leave the old ones behind, and re-add data as needed to get everything just the way you want it.

Brian

On Monday, June 30, 2014 1:22:34 AM UTC-4, Patrick Proniewski wrote:
Brian,

Thank you for the reply, even if it does not answer my question.

By the way, how am I supposed to change a mapping setting? Do I have to push back the entire mapping with one line modified, or can I just push something like:

{
  "logstash": {
     "mappings": {
        "_default_": {
           "_all": {
              "enabled": false
           }
        }
     }
  }
}



On 20 juin 2014, at 23:04, Brian wrote:

> Patrick,
>
> Here's my template, along with where the _all field is disabled. You may wish to add this setting to your own template, and then also add the index setting to ignore malformed data (if someone's log entry occasionally slips in "null" or "no-data" instead of the usual numeric value):
>
> {
>   "automap" : {
>     "template" : "logstash-*",
>     "settings" : {
>       "index.mapping.ignore_malformed" : true
>     },
>     "mappings" : {
>       "_default_" : {
>         "numeric_detection" : true,
>         "_all" : { "enabled" : false },
>         "properties" : {
>           "message" : { "type" : "string" },
>           "host" : { "type" : "string" },
>           "UUID" : {  "type" : "string", "index" : "not_analyzed" },
>           "logdate" : {  "type" : "string", "index" : "no" }
>         }
>       }
>     }
>   }
> }
>
> Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2ff289e5-baf7-4d25-8412-8fcf967440fc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: get rid of _all to optimize storage and perfs (Re: Splunk vs. Elastic search performance?)

Patrick Proniewski
Brian,

On 30 juin 2014, at 22:59, Brian wrote:

> Well, I did answer your question. But probably not from the direction you expected.

hmm no, you didn't. My question was: "it looks like I cant retrieve/display [_all fields] content. Any idea?" and you replied with your logstash template where _all is disabled.
I'm interested in disabling _all, but that was not my question at this point.
 
Your answer to my second message, below, is informative and interesting but fails to answer my second question too. I simply asked whether I need to feed the complete modified mapping of my template or if I can just push the modified part (ie. the _all:{enabled: false} part).


> When I create and manage specific indices, I lock down Elasticsearch. When I update the mappings, I understand that ES will not allow the mapping for an existing field to be modified in an incompatible way. So I only update to add new fields, and never to change or remove an existing field.
>
> For time-based indices as used by the ELK stack, it makes the most sense to me to create an on-disk mapping template. So I always disable the all field and pre-map a subset of string fields as shown in my previous post. I do this because when the next day arrives and logstash causes a new index to be created, that new index will also set my default mapping from the template.
>
> I don't disable the _all field in an existing index that currently has it enabled. I don't know if it would succeed or fail, but I would not expect it to be successful.
>
> Instead, based on my previous experience with ES, I disable the _all field and have disabled it from the very first test deployment of the ELK stack in our group. And then I configured my ES startup script to set message as the default field for a Lucene query. This was already set up and working when I let others have access to it for the very first time. So I don't know the answer to your specific question.
>
> But I do know that a lot of experimentation went into my ELK configurations before I let anyone else look at it for the very first time. So don't be afraid to change your mappings and leave the old ones behind, and re-add data as needed to get everything just the way you want it.
>
> Brian
>
> On Monday, June 30, 2014 1:22:34 AM UTC-4, Patrick Proniewski wrote:
> Brian,
>
> Thank you for the reply, even if it does not answer my question.
>
> By the way, how am I supposed to change a mapping setting? Do I have to push back the entire mapping with one line modified, or can I just push something like:
>
> {
>   "logstash": {
>      "mappings": {
>         "_default_": {
>            "_all": {
>               "enabled": false
>            }
>         }
>      }
>   }
> }
>
>
>
> On 20 juin 2014, at 23:04, Brian wrote:
>
> > Patrick,
> >
> > Here's my template, along with where the _all field is disabled. You may wish to add this setting to your own template, and then also add the index setting to ignore malformed data (if someone's log entry occasionally slips in "null" or "no-data" instead of the usual numeric value):
> >
> > {
> >   "automap" : {
> >     "template" : "logstash-*",
> >     "settings" : {
> >       "index.mapping.ignore_malformed" : true
> >     },
> >     "mappings" : {
> >       "_default_" : {
> >         "numeric_detection" : true,
> >         "_all" : { "enabled" : false },
> >         "properties" : {
> >           "message" : { "type" : "string" },
> >           "host" : { "type" : "string" },
> >           "UUID" : {  "type" : "string", "index" : "not_analyzed" },
> >           "logdate" : {  "type" : "string", "index" : "no" }
> >         }
> >       }
> >     }
> >   }
> > }
> >
> > Brian
>

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/B44B497A-5DC3-4BC5-9164-7F53B5D1D6B6%40patpro.net.
For more options, visit https://groups.google.com/d/optout.
12