Question about s3 gateway vs EBS

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Question about s3 gateway vs EBS

Gautam-2
I'm about to set up a new cluster on ec2 and the idea of using the s3 gateway looks very appealing - if I can avoid dealing with EBS variable latencies, striping etc. it would be great. However, I got the sense from reading posts here and elsewhere that s3 is not the preferable solution if I am to expect reasonably high load on my cluster - is that the general feeling? If so I'm comfortable enough with EBS that I'd go with it instead.
Reply | Threaded
Open this post in threaded view
|

Re: Question about s3 gateway vs EBS

Ivan Brusic
S3 will not help you alleviate any latency issues since its
performance is no better than EBS.

On Fri, Jan 6, 2012 at 4:08 PM, Gautam <[hidden email]> wrote:
> I'm about to set up a new cluster on ec2 and the idea of using the s3
> gateway looks very appealing - if I can avoid dealing with EBS variable
> latencies, striping etc. it would be great. However, I got the sense from
> reading posts here and elsewhere that s3 is not the preferable solution if I
> am to expect reasonably high load on my cluster - is that the general
> feeling? If so I'm comfortable enough with EBS that I'd go with it instead.
Reply | Threaded
Open this post in threaded view
|

Re: Question about s3 gateway vs EBS

kimchy
Administrator
Yea, s3 basically just snapshots the current state of the local stored indexes, you still need to have the index be stored locally. It can be ephemeral drives or EBS.

On Sat, Jan 7, 2012 at 3:55 AM, Ivan Brusic <[hidden email]> wrote:
S3 will not help you alleviate any latency issues since its
performance is no better than EBS.

On Fri, Jan 6, 2012 at 4:08 PM, Gautam <[hidden email]> wrote:
> I'm about to set up a new cluster on ec2 and the idea of using the s3
> gateway looks very appealing - if I can avoid dealing with EBS variable
> latencies, striping etc. it would be great. However, I got the sense from
> reading posts here and elsewhere that s3 is not the preferable solution if I
> am to expect reasonably high load on my cluster - is that the general
> feeling? If so I'm comfortable enough with EBS that I'd go with it instead.

Reply | Threaded
Open this post in threaded view
|

Re: Question about s3 gateway vs EBS

Karel Minařík
Shay, is there any "definitive guide" explaining the differences
between S3 and EBS based persistence?

The way I see it, it's about:


a) S3 is easy to setup and use, it works "out of the box"
b) recreating the state of a large cluster will be painful from S3,
since the data must be physically copied to machines; with EBS, the
data is "already there"

Correct?

Thanks!,

Karel


On Jan 7, 8:37 pm, Shay Banon <[hidden email]> wrote:

> Yea, s3 basically just snapshots the current state of the local stored
> indexes, you still need to have the index be stored locally. It can
> be ephemeral drives or EBS.
>
>
>
>
>
>
>
> On Sat, Jan 7, 2012 at 3:55 AM, Ivan Brusic <[hidden email]> wrote:
> > S3 will not help you alleviate any latency issues since its
> > performance is no better than EBS.
>
> > On Fri, Jan 6, 2012 at 4:08 PM, Gautam <[hidden email]> wrote:
> > > I'm about to set up a new cluster on ec2 and the idea of using the s3
> > > gateway looks very appealing - if I can avoid dealing with EBS variable
> > > latencies, striping etc. it would be great. However, I got the sense from
> > > reading posts here and elsewhere that s3 is not the preferable solution
> > if I
> > > am to expect reasonably high load on my cluster - is that the general
> > > feeling? If so I'm comfortable enough with EBS that I'd go with it
> > instead.
Reply | Threaded
Open this post in threaded view
|

Re: Question about s3 gateway vs EBS

kimchy
Administrator
Yea, b) is the important part. Also, while the system is running, there is a need to keep updating s3 with the current state of the index (in a shared gateway mode).

In the future, I hope to combine the two. Allow to use local gateway (EBS or not), and allow to snapshot (whenever you want) the state to s3, and recover from it if needed to.

On Sun, Jan 8, 2012 at 8:52 AM, Karel Minařík <[hidden email]> wrote:
Shay, is there any "definitive guide" explaining the differences
between S3 and EBS based persistence?

The way I see it, it's about:


a) S3 is easy to setup and use, it works "out of the box"
b) recreating the state of a large cluster will be painful from S3,
since the data must be physically copied to machines; with EBS, the
data is "already there"

Correct?

Thanks!,

Karel


On Jan 7, 8:37 pm, Shay Banon <[hidden email]> wrote:
> Yea, s3 basically just snapshots the current state of the local stored
> indexes, you still need to have the index be stored locally. It can
> be ephemeral drives or EBS.
>
>
>
>
>
>
>
> On Sat, Jan 7, 2012 at 3:55 AM, Ivan Brusic <[hidden email]> wrote:
> > S3 will not help you alleviate any latency issues since its
> > performance is no better than EBS.
>
> > On Fri, Jan 6, 2012 at 4:08 PM, Gautam <[hidden email]> wrote:
> > > I'm about to set up a new cluster on ec2 and the idea of using the s3
> > > gateway looks very appealing - if I can avoid dealing with EBS variable
> > > latencies, striping etc. it would be great. However, I got the sense from
> > > reading posts here and elsewhere that s3 is not the preferable solution
> > if I
> > > am to expect reasonably high load on my cluster - is that the general
> > > feeling? If so I'm comfortable enough with EBS that I'd go with it
> > instead.

Reply | Threaded
Open this post in threaded view
|

Re: Question about s3 gateway vs EBS

James Cook-3
In reply to this post by kimchy
Or memory
Reply | Threaded
Open this post in threaded view
|

Re: Question about s3 gateway vs EBS

kimchy
Administrator
You mean indices that are memory based? Yea, in this case, teh s3 gateway option is pretty cool since you can store the index completely in memory, but still be able to recover if nodes fail (even if all replicas are gone).

On Mon, Jan 9, 2012 at 6:36 AM, James Cook <[hidden email]> wrote:
Or memory

Reply | Threaded
Open this post in threaded view
|

Re: Question about s3 gateway vs EBS

Karel Minařík
Yeah, the memory option would be a pretty powerful option for some
well specified cases!

At the moment, we're using S3 on AWS; yesterday I did a quick test and
I was *very* satisfied with the recovery speed. The setup:

* One index, ~600,000 docs
* 6.5GB

The full recovery cycle -- launch two m1.large nodes (<1min),
bootstrap and provision them with Chef (<3min) to "green" -- state
took something like 10 minutes. Effectively ~1GB per minute. That's
enough "realtime" in my book :)

Karel

On Jan 9, 9:02 pm, Shay Banon <[hidden email]> wrote:

> You mean indices that are memory based? Yea, in this case, teh s3 gateway
> option is pretty cool since you can store the index completely in memory,
> but still be able to recover if nodes fail (even if all replicas are gone).
>
>
>
>
>
>
>
> On Mon, Jan 9, 2012 at 6:36 AM, James Cook <[hidden email]> wrote:
> > Or memory