Memory usage

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Memory usage

Clinton Gormley
Hiya

I've just tried loading a million documents into ElasticSearch, running
on a small dev server, and memory usage grew until eventually there was
none left, and it refused to accept any more docs.

I switched to using the file system rather than memory, and everything
worked nicely (except a bit slower obviously)

However, I have another 4 million docs to load which will take up a LOT
of memory.

Does the sharding mean that: if the memory usage of the node in a
cluster with a single node is 4GB, then the memory usage on each node in
a cluster with 4 nodes will be 1GB (approx)?

thanks

clint
--
Web Announcements Limited is a company registered in England and Wales,
with company number 05608868, with registered address at 10 Arvon Road,
London, N5 1PR.

Reply | Threaded
Open this post in threaded view
|

Re: Memory usage

kimchy
Administrator
On Thu, Mar 18, 2010 at 4:29 PM, Clinton Gormley <[hidden email]> wrote:
Hiya

I've just tried loading a million documents into ElasticSearch, running
on a small dev server, and memory usage grew until eventually there was
none left, and it refused to accept any more docs.

I switched to using the file system rather than memory, and everything
worked nicely (except a bit slower obviously)

Did you use the native memory one (i.e. in 0.5 and above, type: memory in the store with no other argument)? In theory, you are then bounded by the physical memory, and not by how much memory you allocate to the JVM. In this case, by the way, I suggest using large bufferSize.
 

However, I have another 4 million docs to load which will take up a LOT
of memory.

Does the sharding mean that: if the memory usage of the node in a
cluster with a single node is 4GB, then the memory usage on each node in
a cluster with 4 nodes will be 1GB (approx)?

Yep. Thats the idea. Don't forget the replicas though. If you have 5 shards with 1 replica each, then 1 node will take 4G, two nodes will each take 4G (because of the replicas).
 

thanks

clint
--
Web Announcements Limited is a company registered in England and Wales,
with company number 05608868, with registered address at 10 Arvon Road,
London, N5 1PR.


Reply | Threaded
Open this post in threaded view
|

Re: Memory usage

Clinton Gormley

>
> Did you use the native memory one (i.e. in 0.5 and above, type: memory
> in the store with no other argument)?

yes

> In theory, you are then bounded by the physical memory, and not by how
> much memory you allocate to the JVM.

yes - it was a small machine - only 2GB of memory.  But that said,
700,000 objects was using 1.4GB.  I currently need to index 5 million
objects, which will be a lot of memory :)

> In this case, by the way, I suggest using large bufferSize.

You mean when using index.storage.type = 'memory' ?  Why a large
bufferSize?  And how big is considered large?

>
> Yep. Thats the idea. Don't forget the replicas though. If you have 5
> shards with 1 replica each, then 1 node will take 4G, two nodes will
> each take 4G (because of the replicas).

OK to understand this:

I have (eg) 5GB of data when running with one node, which has 5 shards.

If I start 5 nodes, with 5 shards and 2 replicas, then I would have:
 - 2 nodes using 5GB
 - 3 nodes using 1GB

Is this correct?

Still trying to get my head around how this all works :)

(and why aren't you on #elasticsearch on freenode ;)

ta

clint
>
--
Web Announcements Limited is a company registered in England and Wales,
with company number 05608868, with registered address at 10 Arvon Road,
London, N5 1PR.

Reply | Threaded
Open this post in threaded view
|

Re: Memory usage

kimchy
Administrator
Answers below. But before that, let me point you to elasticsearch multi index support. It basically means that you can have one index in memory, which is smaller, and another index that is stored on the file system. For example, if you can break them based on types, it might make sense. Remember that you can search on several indices with elasticsearch.

On Thu, Mar 18, 2010 at 5:31 PM, Clinton Gormley <[hidden email]> wrote:

>
> Did you use the native memory one (i.e. in 0.5 and above, type: memory
> in the store with no other argument)?

yes

> In theory, you are then bounded by the physical memory, and not by how
> much memory you allocate to the JVM.

yes - it was a small machine - only 2GB of memory.  But that said,
700,000 objects was using 1.4GB.  I currently need to index 5 million
objects, which will be a lot of memory :)

So, it means that if you have 5 simple machines with 4gb you would have 20g of memory :).
 

> In this case, by the way, I suggest using large bufferSize.

You mean when using index.storage.type = 'memory' ?  Why a large
bufferSize?  And how big is considered large?

I would say 100k - 200k bufferSize is a good value.
 

>
> Yep. Thats the idea. Don't forget the replicas though. If you have 5
> shards with 1 replica each, then 1 node will take 4G, two nodes will
> each take 4G (because of the replicas).

OK to understand this:

I have (eg) 5GB of data when running with one node, which has 5 shards.

If I start 5 nodes, with 5 shards and 2 replicas, then I would have:
 - 2 nodes using 5GB
 - 3 nodes using 1GB

Is this correct?

Let me simplify the math. If you have 5 shards, each with 2 replicas, you have, in total, 5 * (2 + 1) instances of shards running. In this case 15 instances of shards (shards and their replicas). 

If you have 5g with one node, that has 5 shards, then lets assume we have 1g per shard. This means that for 15 instances of shards you would need 15g.
 

Still trying to get my head around how this all works :)

(and why aren't you on #elasticsearch on freenode ;)

Firewall..., home now, on now...
 

ta

clint
>
--
Web Announcements Limited is a company registered in England and Wales,
with company number 05608868, with registered address at 10 Arvon Road,
London, N5 1PR.