Setting up Elastic search on EC2 - size and number?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Setting up Elastic search on EC2 - size and number?

timrobertson100
Hey,

I am about to index about 200 million records from a tab delimited
file of 23-40 properties per line (most of them indexed). Probably the
data will be 150GB in JSON.

Before I start, does anyone have a feel for what instance types and
how many they'd guess at (single client throughput only right now).
Would 3 large instances (7.5GB memory) do me or would I be better with
a bunch of smaller ones?

Cheers,
Tim
Reply | Threaded
Open this post in threaded view
|

Re: Setting up Elastic search on EC2 - size and number?

kimchy
Administrator
I think even two should be enough. Since you have a single client indexing, the question is how you can parallelize it (even if its on a single process, consider using threads). I have a feeling that you might bottleneck on the client side before you bottleneck on elasticsearch side. If you see that you client can push more than elasticsearch can handle, then it make sense to add another machine.

If you are using a large instance, make sure that you set the -Xmx parameter to a higher value (by default it is -Xmx1g) so elasticsearch will make sure of more memory available on the machine.

-shay.banon

On Fri, Mar 26, 2010 at 5:09 PM, timrobertson100 <[hidden email]> wrote:
Hey,

I am about to index about 200 million records from a tab delimited
file of 23-40 properties per line (most of them indexed). Probably the
data will be 150GB in JSON.

Before I start, does anyone have a feel for what instance types and
how many they'd guess at (single client throughput only right now).
Would 3 large instances (7.5GB memory) do me or would I be better with
a bunch of smaller ones?

Cheers,
Tim

Reply | Threaded
Open this post in threaded view
|

Re: Setting up Elastic search on EC2 - size and number?

timrobertson100
Thanks Shay, I'll try 2 and see the performance. 

I expect for any decent response times on search I will need to get the indexes in memory so will expect more are needed later.



On Fri, Mar 26, 2010 at 3:55 PM, Shay Banon <[hidden email]> wrote:
I think even two should be enough. Since you have a single client indexing, the question is how you can parallelize it (even if its on a single process, consider using threads). I have a feeling that you might bottleneck on the client side before you bottleneck on elasticsearch side. If you see that you client can push more than elasticsearch can handle, then it make sense to add another machine.

If you are using a large instance, make sure that you set the -Xmx parameter to a higher value (by default it is -Xmx1g) so elasticsearch will make sure of more memory available on the machine.

-shay.banon


On Fri, Mar 26, 2010 at 5:09 PM, timrobertson100 <[hidden email]> wrote:
Hey,

I am about to index about 200 million records from a tab delimited
file of 23-40 properties per line (most of them indexed). Probably the
data will be 150GB in JSON.

Before I start, does anyone have a feel for what instance types and
how many they'd guess at (single client throughput only right now).
Would 3 large instances (7.5GB memory) do me or would I be better with
a bunch of smaller ones?

Cheers,
Tim


Reply | Threaded
Open this post in threaded view
|

Re: Setting up Elastic search on EC2 - size and number?

kimchy
Administrator
Do you mean storing the index in memory? It really depends on the FS performance of amazon, I guess, but on local disks (not virtualized) you will be surprised at the performance. If you get to compare it, it will be interesting to hear...

-shay.banon

On Fri, Mar 26, 2010 at 8:06 PM, Tim Robertson <[hidden email]> wrote:
Thanks Shay, I'll try 2 and see the performance. 

I expect for any decent response times on search I will need to get the indexes in memory so will expect more are needed later.




On Fri, Mar 26, 2010 at 3:55 PM, Shay Banon <[hidden email]> wrote:
I think even two should be enough. Since you have a single client indexing, the question is how you can parallelize it (even if its on a single process, consider using threads). I have a feeling that you might bottleneck on the client side before you bottleneck on elasticsearch side. If you see that you client can push more than elasticsearch can handle, then it make sense to add another machine.

If you are using a large instance, make sure that you set the -Xmx parameter to a higher value (by default it is -Xmx1g) so elasticsearch will make sure of more memory available on the machine.

-shay.banon


On Fri, Mar 26, 2010 at 5:09 PM, timrobertson100 <[hidden email]> wrote:
Hey,

I am about to index about 200 million records from a tab delimited
file of 23-40 properties per line (most of them indexed). Probably the
data will be 150GB in JSON.

Before I start, does anyone have a feel for what instance types and
how many they'd guess at (single client throughput only right now).
Would 3 large instances (7.5GB memory) do me or would I be better with
a bunch of smaller ones?

Cheers,
Tim



Reply | Threaded
Open this post in threaded view
|

Re: Setting up Elastic search on EC2 - size and number?

Paul Loy
Hi Tim,

I am very interested to learn how your experiment went/is going. I'm leading the development of an internal middleware solution which must work both in a traditional hosted environment and AWS/EC2. Being able to run Elastic Search on EC2 will help my tech selection efforts.

Any info would be great.

Many thanks,

Paul.

On Fri, Mar 26, 2010 at 6:09 PM, Shay Banon <[hidden email]> wrote:
Do you mean storing the index in memory? It really depends on the FS performance of amazon, I guess, but on local disks (not virtualized) you will be surprised at the performance. If you get to compare it, it will be interesting to hear...

-shay.banon


On Fri, Mar 26, 2010 at 8:06 PM, Tim Robertson <[hidden email]> wrote:
Thanks Shay, I'll try 2 and see the performance. 

I expect for any decent response times on search I will need to get the indexes in memory so will expect more are needed later.




On Fri, Mar 26, 2010 at 3:55 PM, Shay Banon <[hidden email]> wrote:
I think even two should be enough. Since you have a single client indexing, the question is how you can parallelize it (even if its on a single process, consider using threads). I have a feeling that you might bottleneck on the client side before you bottleneck on elasticsearch side. If you see that you client can push more than elasticsearch can handle, then it make sense to add another machine.

If you are using a large instance, make sure that you set the -Xmx parameter to a higher value (by default it is -Xmx1g) so elasticsearch will make sure of more memory available on the machine.

-shay.banon


On Fri, Mar 26, 2010 at 5:09 PM, timrobertson100 <[hidden email]> wrote:
Hey,

I am about to index about 200 million records from a tab delimited
file of 23-40 properties per line (most of them indexed). Probably the
data will be 150GB in JSON.

Before I start, does anyone have a feel for what instance types and
how many they'd guess at (single client throughput only right now).
Would 3 large instances (7.5GB memory) do me or would I be better with
a bunch of smaller ones?

Cheers,
Tim






--
---------------------------------------------
Paul Loy
[hidden email]
http://www.keteracel.com/paul
Reply | Threaded
Open this post in threaded view
|

Re: Setting up Elastic search on EC2 - size and number?

Paolo Castagna
Paul Loy wrote:
> I am very interested to learn how your experiment went/is going. I'm
> leading the development of an internal middleware solution which must
> work both in a traditional hosted environment and AWS/EC2. Being able to
> run Elastic Search on EC2 will help my tech selection efforts.
>
> Any info would be great.

+1 :-)

Thanks,
Paolo
Reply | Threaded
Open this post in threaded view
|

Re: Setting up Elastic search on EC2 - size and number?

timrobertson100
The short answer is I got sidetracked... it is on the list of things
to do and I will share all findings.
For sure it will work, I am just curious how $ it becomes for decent throughput.




On Tue, Mar 30, 2010 at 3:04 PM, Paolo Castagna
<[hidden email]> wrote:

> Paul Loy wrote:
>>
>> I am very interested to learn how your experiment went/is going. I'm
>> leading the development of an internal middleware solution which must work
>> both in a traditional hosted environment and AWS/EC2. Being able to run
>> Elastic Search on EC2 will help my tech selection efforts.
>>
>> Any info would be great.
>
> +1 :-)
>
> Thanks,
> Paolo
>