Elastic search

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Elastic search

Mohit Anchlia
I am new to elastic search and trying to understand the concept. I am
trying to find the information:

1) about how it distributes, replicates data for HA.
2) Where does it store the data?
3) Optimization techniques
Reply | Threaded
Open this post in threaded view
|

Re: Elastic search

Paul Loy
1) ES shards and replicates indexes. It is what I would call 'statically sharded' - that is you specify up front the number of shards and replicas you want and that's how many there will be. Shards and replicas are then allocated to nodes in your cluster.

2) Up to you: http://www.elasticsearch.org/guide/reference/index-modules/store.html

3) Depends upon your use case. Everyone's data and everyone's indexes will be different.

On Sun, Aug 7, 2011 at 8:04 PM, Mo <[hidden email]> wrote:
I am new to elastic search and trying to understand the concept. I am
trying to find the information:

1) about how it distributes, replicates data for HA.
2) Where does it store the data?
3) Optimization techniques



--
---------------------------------------------
Paul Loy
[hidden email]
http://uk.linkedin.com/in/paulloy
Reply | Threaded
Open this post in threaded view
|

Re: Elastic search

Mohit Anchlia
On Sun, Aug 7, 2011 at 3:56 PM, Paul Loy <[hidden email]> wrote:
> 1) ES shards and replicates indexes. It is what I would call 'statically
> sharded' - that is you specify up front the number of shards and replicas
> you want and that's how many there will be. Shards and replicas are then
> allocated to nodes in your cluster.

Is there a link where I can read how to configure that? Also, does it
make it HA for eg: if on enode goes down then it doesn't impact the
searching?

>
> 2) Up to you:
> http://www.elasticsearch.org/guide/reference/index-modules/store.html

How to decide which one to use? I also see it integrates with CouchDB.
When having TBs of data is it ok to keep on the file system?
>
> 3) Depends upon your use case. Everyone's data and everyone's indexes will
> be different.

Are there any general guidelines that might be applicable to everyone
or at least gives litte more thought processing into design this
efficiently?

>
> On Sun, Aug 7, 2011 at 8:04 PM, Mo <[hidden email]> wrote:
>>
>> I am new to elastic search and trying to understand the concept. I am
>> trying to find the information:
>>
>> 1) about how it distributes, replicates data for HA.
>> 2) Where does it store the data?
>> 3) Optimization techniques
>
>
>
> --
> ---------------------------------------------
> Paul Loy
> [hidden email]
> http://uk.linkedin.com/in/paulloy
>
Reply | Threaded
Open this post in threaded view
|

Re: Elastic search

Paul Loy


On Mon, Aug 8, 2011 at 6:34 AM, Mohit Anchlia <[hidden email]> wrote:
On Sun, Aug 7, 2011 at 3:56 PM, Paul Loy <[hidden email]> wrote:
> 1) ES shards and replicates indexes. It is what I would call 'statically
> sharded' - that is you specify up front the number of shards and replicas
> you want and that's how many there will be. Shards and replicas are then
> allocated to nodes in your cluster.

Is there a link where I can read how to configure that? Also, does it
make it HA for eg: if on enode goes down then it doesn't impact the
searching?

Basic configuration will be the index settings where you can set the number of shards and the number of replicas of an index.

http://www.elasticsearch.org/guide/reference/api/admin-indices-create-index.html

What's awesome with ES is that you can specify this on a per index basis. So more critical indices can have a higher number of replicas.

Regarding HA, the was I understand it (and Shay can probably correct me if I'm wrong), there is a 'master' node for a shard. If that node dies, another node with a replica is voted the 'master'. So searches should not be impacted if a node goes down. Obviously if you had enough nodes for one per shard and a node goes down then one node will now have to do 2 shards of searches and so may be slower. So while you can still run searches, you'll need to think about redundancy in your cluster.
 
How to decide which one to use? I also see it integrates with CouchDB.
When having TBs of data is it ok to keep on the file system?

This will be better answered by one of the guys on this list that also pushes TBs of data. I'm only at the GBs size so I use S3 for a gateway just to be sure. I guess the quick answer is you can scale out to meet your needs! If FS is a bottleneck you can add more nodes!?
 
>
> 3) Depends upon your use case. Everyone's data and everyone's indexes will
> be different.

Are there any general guidelines that might be applicable to everyone
or at least gives litte more thought processing into design this
efficiently?

Lots, and it really is dependent on your data and how you want to search it. Some tips I've used are to use filters as much as possible, which seems to have given us a very stable, low latency ES cluster.
 
>
> On Sun, Aug 7, 2011 at 8:04 PM, Mo <[hidden email]> wrote:
>>
>> I am new to elastic search and trying to understand the concept. I am
>> trying to find the information:
>>
>> 1) about how it distributes, replicates data for HA.
>> 2) Where does it store the data?
>> 3) Optimization techniques
>
>
>
> --
> ---------------------------------------------
> Paul Loy
> [hidden email]
> http://uk.linkedin.com/in/paulloy
>



--
---------------------------------------------
Paul Loy
[hidden email]
http://uk.linkedin.com/in/paulloy
Reply | Threaded
Open this post in threaded view
|

Re: Elastic search

Mohit Anchlia
In reply to this post by Paul Loy
Are there any recommendation as to when to use DB compared to file system?

Our use case is simple:

1. We have tons of column name and values in NoSQL column families
that we need to have search capabilities on since NoSQL cassandra
isn't really very good when you need lots of indexes. These are mostly
distinct values.
2. We have xml docs that have attributes that we need to search for.
These have low cardinality.

On Sun, Aug 7, 2011 at 3:56 PM, Paul Loy <[hidden email]> wrote:

> 1) ES shards and replicates indexes. It is what I would call 'statically
> sharded' - that is you specify up front the number of shards and replicas
> you want and that's how many there will be. Shards and replicas are then
> allocated to nodes in your cluster.
>
> 2) Up to you:
> http://www.elasticsearch.org/guide/reference/index-modules/store.html
>
> 3) Depends upon your use case. Everyone's data and everyone's indexes will
> be different.
>
> On Sun, Aug 7, 2011 at 8:04 PM, Mo <[hidden email]> wrote:
>>
>> I am new to elastic search and trying to understand the concept. I am
>> trying to find the information:
>>
>> 1) about how it distributes, replicates data for HA.
>> 2) Where does it store the data?
>> 3) Optimization techniques
>
>
>
> --
> ---------------------------------------------
> Paul Loy
> [hidden email]
> http://uk.linkedin.com/in/paulloy
>
Reply | Threaded
Open this post in threaded view
|

Re: Elastic search

kimchy
Administrator
What kind of recommendations are you after? Not sure I understand the question properly to answer it...

On Wed, Aug 10, 2011 at 6:51 PM, Mohit Anchlia <[hidden email]> wrote:
Are there any recommendation as to when to use DB compared to file system?

Our use case is simple:

1. We have tons of column name and values in NoSQL column families
that we need to have search capabilities on since NoSQL cassandra
isn't really very good when you need lots of indexes. These are mostly
distinct values.
2. We have xml docs that have attributes that we need to search for.
These have low cardinality.

On Sun, Aug 7, 2011 at 3:56 PM, Paul Loy <[hidden email]> wrote:
> 1) ES shards and replicates indexes. It is what I would call 'statically
> sharded' - that is you specify up front the number of shards and replicas
> you want and that's how many there will be. Shards and replicas are then
> allocated to nodes in your cluster.
>
> 2) Up to you:
> http://www.elasticsearch.org/guide/reference/index-modules/store.html
>
> 3) Depends upon your use case. Everyone's data and everyone's indexes will
> be different.
>
> On Sun, Aug 7, 2011 at 8:04 PM, Mo <[hidden email]> wrote:
>>
>> I am new to elastic search and trying to understand the concept. I am
>> trying to find the information:
>>
>> 1) about how it distributes, replicates data for HA.
>> 2) Where does it store the data?
>> 3) Optimization techniques
>
>
>
> --
> ---------------------------------------------
> Paul Loy
> [hidden email]
> http://uk.linkedin.com/in/paulloy
>

Reply | Threaded
Open this post in threaded view
|

Re: Elastic search

Mohit Anchlia
On Sat, Aug 13, 2011 at 9:45 AM, Shay Banon <[hidden email]> wrote:
> What kind of recommendations are you after? Not sure I understand the
> question properly to answer it...

How to decide to use File system or CouchDB? What would be the reason
people would chose one over other? Is it just because you can see data
in some form directly in the DB?

>
> On Wed, Aug 10, 2011 at 6:51 PM, Mohit Anchlia <[hidden email]>
> wrote:
>>
>> Are there any recommendation as to when to use DB compared to file system?
>>
>> Our use case is simple:
>>
>> 1. We have tons of column name and values in NoSQL column families
>> that we need to have search capabilities on since NoSQL cassandra
>> isn't really very good when you need lots of indexes. These are mostly
>> distinct values.
>> 2. We have xml docs that have attributes that we need to search for.
>> These have low cardinality.
>>
>> On Sun, Aug 7, 2011 at 3:56 PM, Paul Loy <[hidden email]> wrote:
>> > 1) ES shards and replicates indexes. It is what I would call 'statically
>> > sharded' - that is you specify up front the number of shards and
>> > replicas
>> > you want and that's how many there will be. Shards and replicas are then
>> > allocated to nodes in your cluster.
>> >
>> > 2) Up to you:
>> > http://www.elasticsearch.org/guide/reference/index-modules/store.html
>> >
>> > 3) Depends upon your use case. Everyone's data and everyone's indexes
>> > will
>> > be different.
>> >
>> > On Sun, Aug 7, 2011 at 8:04 PM, Mo <[hidden email]> wrote:
>> >>
>> >> I am new to elastic search and trying to understand the concept. I am
>> >> trying to find the information:
>> >>
>> >> 1) about how it distributes, replicates data for HA.
>> >> 2) Where does it store the data?
>> >> 3) Optimization techniques
>> >
>> >
>> >
>> > --
>> > ---------------------------------------------
>> > Paul Loy
>> > [hidden email]
>> > http://uk.linkedin.com/in/paulloy
>> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Elastic search

kimchy
Administrator
Still not understanding... . Use a file system or use couchdb? How does relate to elasticsearch? If not, I can still try and help :), but need more info, you want to store blobs on the file system?

On Sat, Aug 13, 2011 at 8:25 PM, Mohit Anchlia <[hidden email]> wrote:
On Sat, Aug 13, 2011 at 9:45 AM, Shay Banon <[hidden email]> wrote:
> What kind of recommendations are you after? Not sure I understand the
> question properly to answer it...

How to decide to use File system or CouchDB? What would be the reason
people would chose one over other? Is it just because you can see data
in some form directly in the DB?
>
> On Wed, Aug 10, 2011 at 6:51 PM, Mohit Anchlia <[hidden email]>
> wrote:
>>
>> Are there any recommendation as to when to use DB compared to file system?
>>
>> Our use case is simple:
>>
>> 1. We have tons of column name and values in NoSQL column families
>> that we need to have search capabilities on since NoSQL cassandra
>> isn't really very good when you need lots of indexes. These are mostly
>> distinct values.
>> 2. We have xml docs that have attributes that we need to search for.
>> These have low cardinality.
>>
>> On Sun, Aug 7, 2011 at 3:56 PM, Paul Loy <[hidden email]> wrote:
>> > 1) ES shards and replicates indexes. It is what I would call 'statically
>> > sharded' - that is you specify up front the number of shards and
>> > replicas
>> > you want and that's how many there will be. Shards and replicas are then
>> > allocated to nodes in your cluster.
>> >
>> > 2) Up to you:
>> > http://www.elasticsearch.org/guide/reference/index-modules/store.html
>> >
>> > 3) Depends upon your use case. Everyone's data and everyone's indexes
>> > will
>> > be different.
>> >
>> > On Sun, Aug 7, 2011 at 8:04 PM, Mo <[hidden email]> wrote:
>> >>
>> >> I am new to elastic search and trying to understand the concept. I am
>> >> trying to find the information:
>> >>
>> >> 1) about how it distributes, replicates data for HA.
>> >> 2) Where does it store the data?
>> >> 3) Optimization techniques
>> >
>> >
>> >
>> > --
>> > ---------------------------------------------
>> > Paul Loy
>> > [hidden email]
>> > http://uk.linkedin.com/in/paulloy
>> >
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Elastic search

Mohit Anchlia
On Sat, Aug 13, 2011 at 12:43 PM, Shay Banon <[hidden email]> wrote:
> Still not understanding... . Use a file system or use couchdb? How does
> relate to elasticsearch? If not, I can still try and help :), but need more
> info, you want to store blobs on the file system?


From what I understand indexes are stored somewhere on the disk. And
from the link http://www.elasticsearch.org/guide/reference/index-modules/store.html
it looks like you have various options. So I am trying to understand
if it should be stored on the file system or some DB like couchDB?

Doesn't elasticsearch store indexed data somewhere?

>
> On Sat, Aug 13, 2011 at 8:25 PM, Mohit Anchlia <[hidden email]>
> wrote:
>>
>> On Sat, Aug 13, 2011 at 9:45 AM, Shay Banon <[hidden email]> wrote:
>> > What kind of recommendations are you after? Not sure I understand the
>> > question properly to answer it...
>>
>> How to decide to use File system or CouchDB? What would be the reason
>> people would chose one over other? Is it just because you can see data
>> in some form directly in the DB?
>> >
>> > On Wed, Aug 10, 2011 at 6:51 PM, Mohit Anchlia <[hidden email]>
>> > wrote:
>> >>
>> >> Are there any recommendation as to when to use DB compared to file
>> >> system?
>> >>
>> >> Our use case is simple:
>> >>
>> >> 1. We have tons of column name and values in NoSQL column families
>> >> that we need to have search capabilities on since NoSQL cassandra
>> >> isn't really very good when you need lots of indexes. These are mostly
>> >> distinct values.
>> >> 2. We have xml docs that have attributes that we need to search for.
>> >> These have low cardinality.
>> >>
>> >> On Sun, Aug 7, 2011 at 3:56 PM, Paul Loy <[hidden email]> wrote:
>> >> > 1) ES shards and replicates indexes. It is what I would call
>> >> > 'statically
>> >> > sharded' - that is you specify up front the number of shards and
>> >> > replicas
>> >> > you want and that's how many there will be. Shards and replicas are
>> >> > then
>> >> > allocated to nodes in your cluster.
>> >> >
>> >> > 2) Up to you:
>> >> > http://www.elasticsearch.org/guide/reference/index-modules/store.html
>> >> >
>> >> > 3) Depends upon your use case. Everyone's data and everyone's indexes
>> >> > will
>> >> > be different.
>> >> >
>> >> > On Sun, Aug 7, 2011 at 8:04 PM, Mo <[hidden email]> wrote:
>> >> >>
>> >> >> I am new to elastic search and trying to understand the concept. I
>> >> >> am
>> >> >> trying to find the information:
>> >> >>
>> >> >> 1) about how it distributes, replicates data for HA.
>> >> >> 2) Where does it store the data?
>> >> >> 3) Optimization techniques
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > ---------------------------------------------
>> >> > Paul Loy
>> >> > [hidden email]
>> >> > http://uk.linkedin.com/in/paulloy
>> >> >
>> >
>> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Elastic search

James Cook
Hi Mo,

There seems to be a disconnect with your questions and some fundamental understanding of how ES (and Lucene) works. I think you need to read the website a bit more, especially take a look at the video:
http://www.elasticsearch.org/videos/2011/08/09/road-to-a-distributed-searchengine-berlinbuzzwords.html

Index storage is under the control of Lucene, and the store page you link to describes your options with simplefs, niofs being file-based, memory being memory-based, and mmapfs being a hybrid of the two. I'm not sure where you got the idea that indexes can also be stored in a DB like CouchDB. 

There is the concept of a River which is a bridge between CouchDB (and others) and ES. A River will receive push changes or periodically will pull changes from a source (like CouchDB, not sure if CouchDB River pushes or pulls) and index the data it receives. This is a technique that can be used to put things for searching into ES without the developer having to specifically index documents into ES. It has nothing to do with how data is stored in ES.

-- jim
Reply | Threaded
Open this post in threaded view
|

Re: Elastic search

Mohit Anchlia
On Sun, Aug 14, 2011 at 6:00 AM, James Cook <[hidden email]> wrote:

> Hi Mo,
> There seems to be a disconnect with your questions and some fundamental
> understanding of how ES (and Lucene) works. I think you need to read the
> website a bit more, especially take a look at the video:
> http://www.elasticsearch.org/videos/2011/08/09/road-to-a-distributed-searchengine-berlinbuzzwords.html
>
> Index storage is under the control of Lucene, and the store page you link to
> describes your options with simplefs, niofs being file-based, memory being
> memory-based, and mmapfs being a hybrid of the two. I'm not sure where you
> got the idea that indexes can also be stored in a DB like CouchDB.
> There is the concept of a River which is a bridge between CouchDB (and
> others) and ES. A River will receive push changes or periodically will pull
> changes from a source (like CouchDB, not sure if CouchDB River pushes or
> pulls) and index the data it receives. This is a technique that can be used
> to put things for searching into ES without the developer having to
> specifically index documents into ES. It has nothing to do with how data is
> stored in ES.

Thanks for clarifying. I will go through that presentation.

> -- jim