Elasticsearch with large amount of data

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Elasticsearch with large amount of data

Jeferson Martins
Hi,

I have 5 nodes of ElasticSearch with 4 CPUs, 8 Mbs of RAM.

My Index today have 1TB of data and my index have about 100GBs By day and i configure 3 primary shards and 1 replica but my elasticsearch gets OutOfMemoy in every two days.

There is some configuration to resolve this problem?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1c018f48-d760-43a3-9878-e3608a113d1f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Elasticsearch with large amount of data

Aaron Mefford
First going to assume you mean 8GBs of memory or I am very impressed that ElasticSearch runs at all.

Second, when are you running out of memory?  
Do you run out of memory while indexing?
  Is it a specific document when indexing?
Do you run out of memory when searching?
  Is it a specific search when searching?
    What type of search, sort, filter?
How many documents do you index each day
 What is the largest document?
 What is the average document?
 Are you indexing in batches?
   How big are your batches?
Of your 8 gb how much is allocated to ElasticSearch?
  How much is allocated to File System Cache?
    (I usually start with 2 GB to the OS, and split the remaining ram between ElasticSearch and FileSystem Cache.  This means allocate 3GB to ElasticSearch.

By a rough swag based on the very little info you have provided, I would say that your cluster does not have enough ram for the level of data you are trying to load into it.  In general I have found that lucene indexes like to be in memory.  When they cannot performance is poor and operations can fail.  By indexing 100GBs of data a day, you are asking ElasticSearch to store some pretty large segments for 8GB or memory (effectively 3GB of ES).

From this page:
http://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html
A machine with 64 GB of RAM is the ideal sweet spot, but 32 GB and 16 GB machines are also common. Less than 8 GB tends to be counterproductive (you end up needing many, many small machines), and greater than 64 GB has problems that we will discuss in Heap: Sizing and Swapping.

Also review:
http://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html
The default installation of Elasticsearch is configured with a 1 GB heap. For just about every deployment, this number is far too small. If you are using the default heap values, your cluster is probably configured incorrectly.

I ran into similar problems with machines that had only 8GB or memory when indexing.  My data volume was lower than what you have indicated.  Upgrading to larger instances with 16GB resolved the issue and I have not had a problem since.  Of course I had tuned everything previously according to what I outlined above.  The 16 GB box means that instead of 3GB for ES you have (16G-2G)/2= 7GB, more than double.  In consulting engagements I always recommend 16GB as a bare minimum, but 32GB as a realistic minimum.

This page also has some good info on it:

https://www.found.no/foundation/sizing-elasticsearch/

Aaron


On Thursday, March 12, 2015 at 6:12:11 PM UTC-6, Jeferson Martins wrote:
Hi,

I have 5 nodes of ElasticSearch with 4 CPUs, 8 Mbs of RAM.

My Index today have 1TB of data and my index have about 100GBs By day and i configure 3 primary shards and 1 replica but my elasticsearch gets OutOfMemoy in every two days.

There is some configuration to resolve this problem?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/02979d4a-c24d-44c0-85a4-a34e01b7dc20%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.