Heap sizing

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Heap sizing

Grant
Hi all,
I'm running my cluster in the cloud with local storage at present
(meaning local to the hosts. I'm also using the local gateway for
storage). At present our nodes have 4gb of ram, of which I'm
allocating 3gb to the jvm. Is there some consensus with ES on heap
sizing? Do I want to keep my jvm trimmed to take more advantage of
disk cache, or should I allocate all available ram to the heap?

Reply | Threaded
Open this post in threaded view
|

Re: Heap sizing

Karussell
rule of thumb is: half for the jvm. half for this disc cache

Peter.

On 22 Jan., 22:54, Grant <[hidden email]> wrote:
> Hi all,
> I'm running my cluster in the cloud with local storage at present
> (meaning local to the hosts. I'm also using the local gateway for
> storage). At present our nodes have 4gb of ram, of which I'm
> allocating 3gb to the jvm. Is there some consensus with ES on heap
> sizing? Do I want to keep my jvm trimmed to take more advantage of
> disk cache, or should I allocate all available ram to the heap?
Reply | Threaded
Open this post in threaded view
|

Re: Heap sizing

kimchy
Administrator
In reply to this post by Grant
Hard to say without more data. You can check the memory usage behavior using something like bigdesk.

On Sun, Jan 22, 2012 at 11:54 PM, Grant <[hidden email]> wrote:
Hi all,
I'm running my cluster in the cloud with local storage at present
(meaning local to the hosts. I'm also using the local gateway for
storage). At present our nodes have 4gb of ram, of which I'm
allocating 3gb to the jvm. Is there some consensus with ES on heap
sizing? Do I want to keep my jvm trimmed to take more advantage of
disk cache, or should I allocate all available ram to the heap?


Reply | Threaded
Open this post in threaded view
|

Re: Heap sizing

Grant
Right now between half and 75% the heap allocation is actually being
used depending on what's going on. But our data set is much smaller
than it will eventually be (although obviously we'll add nodes as
required to balance ram requirements with data size).

If all the disk caching is done outside the jvm, at my present
allocation I've only got 1gb of free RAM... if I need to keep my data
set that small per node to ensure most of it resides in the disk cache
I think I'll need to reduce the heap.

Our data set is going to be comprised of a LOT of very small (in the
neighborhood of a few hundred kb to ~50mb at the top end) indices.
We're running with one (i.e. no) shards and 3 replicas per index.

On Jan 23, 2:01 pm, Shay Banon <[hidden email]> wrote:

> Hard to say without more data. You can check the memory usage behavior
> using something like bigdesk.
>
>
>
>
>
>
>
> On Sun, Jan 22, 2012 at 11:54 PM, Grant <[hidden email]> wrote:
> > Hi all,
> > I'm running my cluster in the cloud with local storage at present
> > (meaning local to the hosts. I'm also using the local gateway for
> > storage). At present our nodes have 4gb of ram, of which I'm
> > allocating 3gb to the jvm. Is there some consensus with ES on heap
> > sizing? Do I want to keep my jvm trimmed to take more advantage of
> > disk cache, or should I allocate all available ram to the heap?
Reply | Threaded
Open this post in threaded view
|

Re: Heap sizing

kimchy
Administrator
Disk cache is there to help speed operations by the OS, the more you have for it, the better.

But, you say you are going to have a LOT of small indices. Even with one shard per index, you probably will overload the cluster you have as a single shard is not lightweight (unless you are going to have a large cluster). See here on how to use routing to solve something like this: https://groups.google.com/forum/#!searchin/elasticsearch/data$20flow/elasticsearch/49q-_AgQCp8/MRol0t9asEcJ.

On Mon, Jan 23, 2012 at 10:44 PM, Grant <[hidden email]> wrote:
Right now between half and 75% the heap allocation is actually being
used depending on what's going on. But our data set is much smaller
than it will eventually be (although obviously we'll add nodes as
required to balance ram requirements with data size).

If all the disk caching is done outside the jvm, at my present
allocation I've only got 1gb of free RAM... if I need to keep my data
set that small per node to ensure most of it resides in the disk cache
I think I'll need to reduce the heap.

Our data set is going to be comprised of a LOT of very small (in the
neighborhood of a few hundred kb to ~50mb at the top end) indices.
We're running with one (i.e. no) shards and 3 replicas per index.

On Jan 23, 2:01 pm, Shay Banon <[hidden email]> wrote:
> Hard to say without more data. You can check the memory usage behavior
> using something like bigdesk.
>
>
>
>
>
>
>
> On Sun, Jan 22, 2012 at 11:54 PM, Grant <[hidden email]> wrote:
> > Hi all,
> > I'm running my cluster in the cloud with local storage at present
> > (meaning local to the hosts. I'm also using the local gateway for
> > storage). At present our nodes have 4gb of ram, of which I'm
> > allocating 3gb to the jvm. Is there some consensus with ES on heap
> > sizing? Do I want to keep my jvm trimmed to take more advantage of
> > disk cache, or should I allocate all available ram to the heap?