ElasticSearch vs NoSQL

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

ElasticSearch vs NoSQL

Gísli Kristjánsson
I must begin by praising the effort. I have been in a nosql-research-
mode for the last few days and I'm still discovering new cool
projects. From what I've seen from the official website (Screencast,
Docs, and forum) ElasticSearch is definitely one of the more
impressing projects sofar (and still I just discovered it a couple of
hours ago).

As I'm now comfortable with the CAP Theorem I think I'm getting ready
to pick my cherry from the myriad of NoSQL options. The top contenders
for me at the moment are MongoDB and Riak. I lean towards Riak (from a
design and implementation perspective) but MongoDB's query language
seems very powerful.

After reading the NoSQL, Yes Search (http://www.elasticsearch.com/blog/
2010/02/25/nosql_yessearch.html) I concluded that a mix of Riak with
search supported with ElasticServer might be the perfect combination
(as described in the blog entry).

After I while I started asking myself; why do I need to use
ElasticSearch as a supplement to another NoSQL implementation since
the whole object seems to be stored within Elastic Search. This seems
to be further supported by the introduction of binary attachments
(http://groups.google.com/a/elasticsearch.com/group/users/
browse_thread/thread/f0a26efd88365bad#). Am I missing something here
or is the ElasticSerach only to be used in conjunction with another
datastore}
Reply | Threaded
Open this post in threaded view
|

Re: ElasticSearch vs NoSQL

Sergio Bossa
2010/4/5 Gísli Kristjánsson <[hidden email]>:

> After reading the NoSQL, Yes Search (http://www.elasticsearch.com/blog/
> 2010/02/25/nosql_yessearch.html) I concluded that a mix of Riak with
> search supported with ElasticServer might be the perfect combination
> (as described in the blog entry).

Hi Gisli,

the "NoSQL, Yes Search" blog post mentions that ElasticSearch has
_already_ been integrated with the Terrastore NoSQL store, so I'd like
to know why you think Riak would be a better fit/choice: your feedback
will help us improve the integration and understand what's wrong.

Thanks for sharing your thoughts,
Cheers,

Sergio B.

--
Sergio Bossa
http://www.linkedin.com/in/sergiob
Reply | Threaded
Open this post in threaded view
|

Re: ElasticSearch vs NoSQL

timrobertson100
I am also interested in the response to the original question.
With ES storing the JSON document, it seems a kludgy integration since
the Doc is stored twice over (I'm interested in HBase because of
MapReduce support for other needs).  Would a better integration be to
allow ES handle all indexing, only store the DocID in the index, and
hook up the datastore so that ES delegates all GetByKey to the
underlying storage system?




On Mon, Apr 5, 2010 at 9:36 AM, Sergio Bossa <[hidden email]> wrote:

> 2010/4/5 Gísli Kristjánsson <[hidden email]>:
>
>> After reading the NoSQL, Yes Search (http://www.elasticsearch.com/blog/
>> 2010/02/25/nosql_yessearch.html) I concluded that a mix of Riak with
>> search supported with ElasticServer might be the perfect combination
>> (as described in the blog entry).
>
> Hi Gisli,
>
> the "NoSQL, Yes Search" blog post mentions that ElasticSearch has
> _already_ been integrated with the Terrastore NoSQL store, so I'd like
> to know why you think Riak would be a better fit/choice: your feedback
> will help us improve the integration and understand what's wrong.
>
> Thanks for sharing your thoughts,
> Cheers,
>
> Sergio B.
>
> --
> Sergio Bossa
> http://www.linkedin.com/in/sergiob
>
Reply | Threaded
Open this post in threaded view
|

Re: ElasticSearch vs NoSQL

Gísli Kristjánsson
Hi Sergio (and Tim),

Terrastore is a very promising alternative but the reasons I have for
prefering Riak are:
* Riak is more mature
 - Documents
 - Community
 - (Enterprise) Support
* I like the consept of no master setup in Riak
* I program in Erlang and a storage system on the same platform is
good feelingTM
* MapReduce is a powerful way to transform/query data

I'll be keeping an eye on Terrastore though as it improves.

Back to my original question (as Tim and I seem eagerly interested).
If the ES is not intended to be the storage system a solution like Tim
suggested is very interesting. And, can someone explain the difference
to me between an index system (such as ES) that stores the data
(including binaries via attachment plugin) and a storage system (like
Terrastore)?

Thanks,
Gísli

On Apr 5, 11:33 am, Tim Robertson <[hidden email]> wrote:

> I am also interested in the response to the original question.
> With ES storing the JSON document, it seems a kludgy integration since
> the Doc is stored twice over (I'm interested in HBase because of
> MapReduce support for other needs).  Would a better integration be to
> allow ES handle all indexing, only store the DocID in the index, and
> hook up the datastore so that ES delegates all GetByKey to the
> underlying storage system?
>
>
>
> On Mon, Apr 5, 2010 at 9:36 AM, Sergio Bossa <[hidden email]> wrote:
> > 2010/4/5 Gísli Kristjánsson <[hidden email]>:
>
> >> After reading the NoSQL, Yes Search (http://www.elasticsearch.com/blog/
> >> 2010/02/25/nosql_yessearch.html) I concluded that a mix of Riak with
> >> search supported with ElasticServer might be the perfect combination
> >> (as described in the blog entry).
>
> > Hi Gisli,
>
> > the "NoSQL, Yes Search" blog post mentions that ElasticSearch has
> > _already_ been integrated with the Terrastore NoSQL store, so I'd like
> > to know why you think Riak would be a better fit/choice: your feedback
> > will help us improve the integration and understand what's wrong.
>
> > Thanks for sharing your thoughts,
> > Cheers,
>
> > Sergio B.
>
> > --
> > Sergio Bossa
> >http://www.linkedin.com/in/sergiob
Reply | Threaded
Open this post in threaded view
|

Re: ElasticSearch vs NoSQL

Gísli Kristjánsson
Also as I see you're the author of Terrastore I got the following
error when trying to start the server (after a successful master
startup) on my MacBook Pro:

MacBook-Pro:bin gislik$ sh start.sh --master localhost:9510
Starting Terrastore Server ...
Exception in thread "main" java.lang.UnsupportedClassVersionError: Bad
version number in .class file
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:676)
        at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:
124)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:
260)
        at java.net.URLClassLoader.access$100(URLClassLoader.java:56)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
        at sun.misc.Launcher$AppClassLoader.findClass(Launcher.java)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:317)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:
280)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:
375)

On Apr 5, 12:12 pm, Gísli Kristjánsson <[hidden email]> wrote:

> Hi Sergio (and Tim),
>
> Terrastore is a very promising alternative but the reasons I have for
> prefering Riak are:
> * Riak is more mature
>  - Documents
>  - Community
>  - (Enterprise) Support
> * I like the consept of no master setup in Riak
> * I program in Erlang and a storage system on the same platform is
> good feelingTM
> * MapReduce is a powerful way to transform/query data
>
> I'll be keeping an eye on Terrastore though as it improves.
>
> Back to my original question (as Tim and I seem eagerly interested).
> If the ES is not intended to be the storage system a solution like Tim
> suggested is very interesting. And, can someone explain the difference
> to me between an index system (such as ES) that stores the data
> (including binaries via attachment plugin) and a storage system (like
> Terrastore)?
>
> Thanks,
> Gísli
>
> On Apr 5, 11:33 am, Tim Robertson <[hidden email]> wrote:
>
>
>
> > I am also interested in the response to the original question.
> > With ES storing the JSON document, it seems a kludgy integration since
> > the Doc is stored twice over (I'm interested in HBase because of
> > MapReduce support for other needs).  Would a better integration be to
> > allow ES handle all indexing, only store the DocID in the index, and
> > hook up the datastore so that ES delegates all GetByKey to the
> > underlying storage system?
>
> > On Mon, Apr 5, 2010 at 9:36 AM, Sergio Bossa <[hidden email]> wrote:
> > > 2010/4/5 Gísli Kristjánsson <[hidden email]>:
>
> > >> After reading the NoSQL, Yes Search (http://www.elasticsearch.com/blog/
> > >> 2010/02/25/nosql_yessearch.html) I concluded that a mix of Riak with
> > >> search supported with ElasticServer might be the perfect combination
> > >> (as described in the blog entry).
>
> > > Hi Gisli,
>
> > > the "NoSQL, Yes Search" blog post mentions that ElasticSearch has
> > > _already_ been integrated with the Terrastore NoSQL store, so I'd like
> > > to know why you think Riak would be a better fit/choice: your feedback
> > > will help us improve the integration and understand what's wrong.
>
> > > Thanks for sharing your thoughts,
> > > Cheers,
>
> > > Sergio B.
>
> > > --
> > > Sergio Bossa
> > >http://www.linkedin.com/in/sergiob
Reply | Threaded
Open this post in threaded view
|

Re: ElasticSearch vs NoSQL

kimchy
Administrator
In reply to this post by Gísli Kristjánsson
Hi all,

   Its a very interesting question, how to use elasticsearch within your architecture. Let me first explain why elasticsearch stores (by default, it can be disabled in the next version) the json source. The idea is that when you search, the search request is already executing right where the data is. For this reason, if you are already local to the data, it makes a lot of sense to also fetch what needs to be displayed in the search results as well. If the _source field is disabled, then, for the N number of hits you get back, you need to execute N fetch requests (or a single batch, if multi keys fetch is supported) to your data storage to fetch it. If its ok in terms of latency and overhead on the system, then its acceptable, assuming that storing the actual source json is a big overhead within the index storage.

   Also note that elasticsearch is a near real time search (though I hope to get it to be real time some day). This means that if you index a document, your search/get requests will see it after a certain interval (can be configured).

   As to the question if elasticsearch can basically act as the single nosql solution of choice, namely the main storage of your data, it depends. First note, that elasticsearch is not a 1.0 version (I consider it a strong beta, some sites are about to go live with it any day now), so, I would consider not using it as the main data storage currently. This is for the simple reason that if something goes really bad, you can always reindex the data.

  I have worked and been involved with several projects that actually used Lucene as the main storage system of applications, and they were happy with it. Will elasticsearch become a possible main data storage? Depends. If what it provides fits the bill, and it goes GA, then go with it. If not (you need versioning, transactionality), then it can certainly be a complimentary solution to your nosql of choice.

  If you do decide to go with Riak, then elasticsearch is certainly a good choice here, as it gives you the ability to have a very rich query model and search on top of your data. As a side note, I know Riak are working on a search engine. Not sure when it is going to come out. But, as skilled as the people on riak land are (and they really are), I doubt that they can easily build something that can compare to the richness of elasticsearch (and Lucene under it).

  No matter which solution you choose to go with, I would love to cooperate on getting some sort of a plugin built into elasticsearch to automatically index the nosql you work with. Unless, of course, you go with terrastore, which has it built in :).

cheers,
shay.banon

2010/4/5 Gísli Kristjánsson <[hidden email]>
Hi Sergio (and Tim),

Terrastore is a very promising alternative but the reasons I have for
prefering Riak are:
* Riak is more mature
 - Documents
 - Community
 - (Enterprise) Support
* I like the consept of no master setup in Riak
* I program in Erlang and a storage system on the same platform is
good feelingTM
* MapReduce is a powerful way to transform/query data

I'll be keeping an eye on Terrastore though as it improves.

Back to my original question (as Tim and I seem eagerly interested).
If the ES is not intended to be the storage system a solution like Tim
suggested is very interesting. And, can someone explain the difference
to me between an index system (such as ES) that stores the data
(including binaries via attachment plugin) and a storage system (like
Terrastore)?

Thanks,
Gísli

On Apr 5, 11:33 am, Tim Robertson <[hidden email]> wrote:
> I am also interested in the response to the original question.
> With ES storing the JSON document, it seems a kludgy integration since
> the Doc is stored twice over (I'm interested in HBase because of
> MapReduce support for other needs).  Would a better integration be to
> allow ES handle all indexing, only store the DocID in the index, and
> hook up the datastore so that ES delegates all GetByKey to the
> underlying storage system?
>
>
>
> On Mon, Apr 5, 2010 at 9:36 AM, Sergio Bossa <[hidden email]> wrote:
> > 2010/4/5 Gísli Kristjánsson <[hidden email]>:
>
> >> After reading the NoSQL, Yes Search (http://www.elasticsearch.com/blog/
> >> 2010/02/25/nosql_yessearch.html) I concluded that a mix of Riak with
> >> search supported with ElasticServer might be the perfect combination
> >> (as described in the blog entry).
>
> > Hi Gisli,
>
> > the "NoSQL, Yes Search" blog post mentions that ElasticSearch has
> > _already_ been integrated with the Terrastore NoSQL store, so I'd like
> > to know why you think Riak would be a better fit/choice: your feedback
> > will help us improve the integration and understand what's wrong.
>
> > Thanks for sharing your thoughts,
> > Cheers,
>
> > Sergio B.
>
> > --
> > Sergio Bossa
> >http://www.linkedin.com/in/sergiob

Reply | Threaded
Open this post in threaded view
|

Re: ElasticSearch vs NoSQL

Sergio Bossa
In reply to this post by Gísli Kristjánsson
2010/4/5 Gísli Kristjánsson <[hidden email]>:

> Terrastore is a very promising alternative but the reasons I have for
> prefering Riak are:
> * Riak is more mature
>  - Documents
>  - Community
>  - (Enterprise) Support
> * I like the consept of no master setup in Riak
> * I program in Erlang and a storage system on the same platform is
> good feelingTM
> * MapReduce is a powerful way to transform/query data
>
> I'll be keeping an eye on Terrastore though as it improves.

Got it, you have absolutely valid reasons.
Just to be clear, I didn't want to endorse Terrastore, only know the
reason of your choice: Riak is great, and if it fits your need better
than others, just go with it ;)

--
Sergio Bossa
http://www.linkedin.com/in/sergiob
Reply | Threaded
Open this post in threaded view
|

Re: ElasticSearch vs NoSQL

Sergio Bossa
In reply to this post by Gísli Kristjánsson
2010/4/5 Gísli Kristjánsson <[hidden email]>:
> Also as I see you're the author of Terrastore I got the following
> error when trying to start the server (after a successful master
> startup) on my MacBook Pro:

It seems a problem with your JDK version: do you mind moving your
question to the Terrastore mailing list, it's off-topic here and I
don't want to annoy ElasticSearch users ;)

Thanks!

--
Sergio Bossa
http://www.linkedin.com/in/sergiob
Reply | Threaded
Open this post in threaded view
|

Re: ElasticSearch vs NoSQL

Sergio Bossa
In reply to this post by timrobertson100
On Mon, Apr 5, 2010 at 1:33 PM, Tim Robertson <[hidden email]> wrote:

> Would a better integration be to
> allow ES handle all indexing, only store the DocID in the index, and
> hook up the datastore so that ES delegates all GetByKey to the
> underlying storage system?

There's an issue about that, feel free to comment on:
http://github.com/elasticsearch/elasticsearch/issues#issue/67
I agree it would be great, maybe I'll find some time to contribute
some code to the already amazing work made by Shay ;)

--
Sergio Bossa
http://www.linkedin.com/in/sergiob
Reply | Threaded
Open this post in threaded view
|

Re: ElasticSearch vs NoSQL

Berkay Mollamustafaoglu-2
In reply to this post by Sergio Bossa
"Can someone explain the difference to me between an index system (such as ES) that stores the data (including binaries via attachment plugin) and a storage system (like Terrastore)? "

This is a question that's been in my mind for some time as well. (Shay has provided his take on it as I was writing this). It is clear what ES brings to the table when used along side with a nosql solution. It is harder to pin down what document stores provide that ES does not.  I suspect the answer is different for different nosql solutions. I'd be great to hear from users of the various nosql solutions as they get familiar with ES. Shay already pointed out couple of areas where ES may not suitable: Transactionality and Near Real-time (as opposed to real time). However, most nosql solutions don't have transaction support either.  I'm looking forward to get educated on what else document stores bring to the table :)

Also, as Shay warns, ES is new and not GA, but it leverages mature libraries which is helpful.  The fact that it uses Lucene (a mature library) as the data store rather  is a great comfort and may be preferred to relatively untested nature of nosql stores.


Regards,
Berkay Mollamustafaoglu
http://www.ifountain.com
Ph: +1 (571) 766-6292
mberkay on yahoo, google and skype


On Mon, Apr 5, 2010 at 9:57 AM, Sergio Bossa <[hidden email]> wrote:
2010/4/5 Gísli Kristjánsson <[hidden email]>:

> Terrastore is a very promising alternative but the reasons I have for
> prefering Riak are:
> * Riak is more mature
>  - Documents
>  - Community
>  - (Enterprise) Support
> * I like the consept of no master setup in Riak
> * I program in Erlang and a storage system on the same platform is
> good feelingTM
> * MapReduce is a powerful way to transform/query data
>
> I'll be keeping an eye on Terrastore though as it improves.

Got it, you have absolutely valid reasons.
Just to be clear, I didn't want to endorse Terrastore, only know the
reason of your choice: Riak is great, and if it fits your need better
than others, just go with it ;)

--

Reply | Threaded
Open this post in threaded view
|

Re: ElasticSearch vs NoSQL

timrobertson100
> It is harder to pin down what document stores provide that ES does not

With HBase, a huge advantage is the other family of Hadoop products.
- Hive (a SQL "engine" from Facebook) gives
  - the ability to run "reports" such as counts with group by's on huge data
  - ability to do huge joins easily (I did 200million to 200 million
producing >1Billion in under 10 mins)
  - can run on delimited files (e.g. CSVs)
  - Hive has an HBase input format
- MapReduce from Hadoop

there are a few indexing options popping up on HBase, which led me to
search and land on this mailing list.  Some investigation shows ES is
a nice candidate to offer the search capabilities missing natively on
HBase so I am pondering some integration.



On Mon, Apr 5, 2010 at 4:16 PM, Berkay Mollamustafaoglu
<[hidden email]> wrote:

> "Can someone explain the difference to me between an index system (such as
> ES) that stores the data (including binaries via attachment plugin) and a
> storage system (like Terrastore)? "
> This is a question that's been in my mind for some time as well. (Shay has
> provided his take on it as I was writing this). It is clear what ES brings
> to the table when used along side with a nosql solution. It is harder to pin
> down what document stores provide that ES does not.  I suspect the answer is
> different for different nosql solutions. I'd be great to hear from users of
> the various nosql solutions as they get familiar with ES. Shay already
> pointed out couple of areas where ES may not suitable: Transactionality and
> Near Real-time (as opposed to real time). However, most nosql solutions
> don't have transaction support either.  I'm looking forward to get educated
> on what else document stores bring to the table :)
> Also, as Shay warns, ES is new and not GA, but it leverages mature libraries
> which is helpful.  The fact that it uses Lucene (a mature library) as the
> data store rather  is a great comfort and may be preferred to relatively
> untested nature of nosql stores.
>
> Regards,
> Berkay Mollamustafaoglu
> http://www.ifountain.com
> Ph: +1 (571) 766-6292
> mberkay on yahoo, google and skype
>
>
> On Mon, Apr 5, 2010 at 9:57 AM, Sergio Bossa <[hidden email]> wrote:
>>
>> 2010/4/5 Gísli Kristjánsson <[hidden email]>:
>>
>> > Terrastore is a very promising alternative but the reasons I have for
>> > prefering Riak are:
>> > * Riak is more mature
>> >  - Documents
>> >  - Community
>> >  - (Enterprise) Support
>> > * I like the consept of no master setup in Riak
>> > * I program in Erlang and a storage system on the same platform is
>> > good feelingTM
>> > * MapReduce is a powerful way to transform/query data
>> >
>> > I'll be keeping an eye on Terrastore though as it improves.
>>
>> Got it, you have absolutely valid reasons.
>> Just to be clear, I didn't want to endorse Terrastore, only know the
>> reason of your choice: Riak is great, and if it fits your need better
>> than others, just go with it ;)
>>
>> --
>> Sergio Bossa
>> http://www.linkedin.com/in/sergiob
>
>
Reply | Threaded
Open this post in threaded view
|

Re: ElasticSearch vs NoSQL

Gísli Kristjánsson
In reply to this post by Sergio Bossa
This is now an open issue on the Terrastore's Google Code :)

On Apr 5, 1:59 pm, Sergio Bossa <[hidden email]> wrote:

> 2010/4/5 Gísli Kristjánsson <[hidden email]>:
>
> > Also as I see you're the author of Terrastore I got the following
> > error when trying to start the server (after a successful master
> > startup) on my MacBook Pro:
>
> It seems a problem with your JDK version: do you mind moving your
> question to the Terrastore mailing list, it's off-topic here and I
> don't want to annoy ElasticSearch users ;)
>
> Thanks!
>
> --
> Sergio Bossahttp://www.linkedin.com/in/sergiob
Reply | Threaded
Open this post in threaded view
|

Re: ElasticSearch vs NoSQL

Sergio Bossa
In reply to this post by Berkay Mollamustafaoglu-2
On Mon, Apr 5, 2010 at 4:16 PM, Berkay Mollamustafaoglu
<[hidden email]> wrote:

> It is clear what ES brings
> to the table when used along side with a nosql solution. It is harder to pin
> down what document stores provide that ES does not.

There are certainly a few things that ElasticSearch doesn't
(currently) offer as a storage solution, more specifically:

1) Real-time storage: storing and getting back data depends on near
real time Lucene capabilities.
2) Durability: indexes aren't durable across node restarts, unless you
configure a gateway whose persistence is, however, snapshot based, so
you may lose   the latest data (AFAIU, please correct me if wrong).
3) Performance: Lucene isn't intended as a storage solution; it may or
may not work for your needs, but again, that's not the intended use
(and in my own experience, it doesn't work).

In other words, in order to be a complete storage and indexing
solution by its own, ElasticSearch should IMHO offer separated storage
for its documents, maybe something like an embedded java berkeley db,
but it's not *that* easy and I don't know if it makes sense to provide
from scratch what other (SQL/NoSQL) solutions already do ... but Shay
has absolutely the last word on that ;)

Cheers,

Sergio B.

--
Sergio Bossa
http://www.linkedin.com/in/sergiob
Reply | Threaded
Open this post in threaded view
|

Re: ElasticSearch vs NoSQL

kimchy
Administrator
Hi,

  Let me first address point 2, as its core elasticsearch. ElasticSearch does provides durability. Snapshots pointed at gateway are interval based, but everything (indices and a transaction log) are maintained across shard replicas. This means that if a node fails, then the replicas will make sure everything is snapshotted to the gateway properly. This is how write behind works in most data grid vendors (coherence, gigaspaces).

  Performance wise, well, it all depends on in memory caching. As anybody who used berkleydb when not all its btree nodes manage to fit in memory knows :). Currently, Lucene should be as fast as berkely assuming data resides on disk (and faster on SSDs, and yet faster with in memory storage, thanks to how it works). One thing that I plan to add is caching on other levels than just query filters cache, but to be honest, most times, its not really needed... .

  ElasticSearch by no means aims to replace other nosql solutions. Its going to evolve and provides its own features. If they fit the bill, great. If not, elasticsearch is going to integrate well with most nosql solutions out there. How well? I really hope that by 1.0, elasticsearch will be able to automatically index data in most common nosql solutions (with the help of the community, I will write the first one ;) ).

cheers,
shay.banon

On Mon, Apr 5, 2010 at 8:31 PM, Sergio Bossa <[hidden email]> wrote:
On Mon, Apr 5, 2010 at 4:16 PM, Berkay Mollamustafaoglu
<[hidden email]> wrote:

> It is clear what ES brings
> to the table when used along side with a nosql solution. It is harder to pin
> down what document stores provide that ES does not.

There are certainly a few things that ElasticSearch doesn't
(currently) offer as a storage solution, more specifically:

1) Real-time storage: storing and getting back data depends on near
real time Lucene capabilities.
2) Durability: indexes aren't durable across node restarts, unless you
configure a gateway whose persistence is, however, snapshot based, so
you may lose   the latest data (AFAIU, please correct me if wrong).
3) Performance: Lucene isn't intended as a storage solution; it may or
may not work for your needs, but again, that's not the intended use
(and in my own experience, it doesn't work).

In other words, in order to be a complete storage and indexing
solution by its own, ElasticSearch should IMHO offer separated storage
for its documents, maybe something like an embedded java berkeley db,
but it's not *that* easy and I don't know if it makes sense to provide
from scratch what other (SQL/NoSQL) solutions already do ... but Shay
has absolutely the last word on that ;)

Cheers,

Sergio B.

--

Reply | Threaded
Open this post in threaded view
|

Re: ElasticSearch vs NoSQL

alexandre gerlic
2010/4/6 Shay Banon <[hidden email]>:
> Hi,
>   Let me first address point 2, as its core elasticsearch. ElasticSearch
> does provides durability. Snapshots pointed at gateway are interval based,
> but everything (indices and a transaction log) are maintained across shard
> replicas. This means that if a node fails, then the replicas will make sure
> everything is snapshotted to the gateway properly. This is how write behind
> works in most data grid vendors (coherence, gigaspaces).
>

Hi,

to interact between cassandra and ES, I am currently working on this way :
- put/remove on Cassandra will call ES via Java API (same behavior as
blog post "NoSQL, Yes Search"
- create CassandraGateway and CassandraIndexGateway
- gateway_index_snapshot disabled
- gateway_index_recover created from Cassandra : create Translog (only
CREATE instructions) from Cassandra

Except _source disabled issue, the fact is to avoid to double data
between nosql solution and ES.
If ES cluster crash, I hope this solution will help me to recreate ES
cluster directly from database
instead of file system.

--
Alexandre Gerlic
Reply | Threaded
Open this post in threaded view
|

Re: ElasticSearch vs NoSQL

kimchy
Administrator
On Tue, Apr 6, 2010 at 2:47 AM, alexandre gerlic <[hidden email]> wrote:
2010/4/6 Shay Banon <[hidden email]>:
> Hi,
>   Let me first address point 2, as its core elasticsearch. ElasticSearch
> does provides durability. Snapshots pointed at gateway are interval based,
> but everything (indices and a transaction log) are maintained across shard
> replicas. This means that if a node fails, then the replicas will make sure
> everything is snapshotted to the gateway properly. This is how write behind
> works in most data grid vendors (coherence, gigaspaces).
>

Hi,

to interact between cassandra and ES, I am currently working on this way :
- put/remove on Cassandra will call ES via Java API (same behavior as
blog post "NoSQL, Yes Search"

Nice!. Wondering here about edge cases with how cassandra work (know it in theory and partly by code). Would love to see some code if you have it.
 
- create CassandraGateway and CassandraIndexGateway
- gateway_index_snapshot disabled
- gateway_index_recover created from Cassandra : create Translog (only
CREATE instructions) from Cassandra

I think that it would make sense to store the full index and the transaction log on cassandra. Rebuilding the index is not something that you would want to do. Storing the index itself is a simple manner of simulating a file system on top of cassandra API.
 

Except _source disabled issue, the fact is to avoid to double data
between nosql solution and ES.

I have explained why I think storing the _source in elasticsearch still make sense. But of course, the option is there to disable it.
 
If ES cluster crash, I hope this solution will help me to recreate ES
cluster directly from database
instead of file system.

I think that if you store the index itself on cassandara as well, even if the whole elasticsearch cluster crashes, you won't have to reindex the data. Thats the general idea.
 

--
Alexandre Gerlic

Reply | Threaded
Open this post in threaded view
|

Re: ElasticSearch vs NoSQL

Eks Dev
I just started playing with ES and had to comment this subject.

imo, this question in subject (discussion is great!) is plain wrong. What I
would like to see somewhere is rather search and "nosql db". Keeping these
two topics apart is like saying, OK let us separate DBMS from indexing and
SQL. Search is great, nosqldb-s are great, but not enough.

"traditional search" is just one application, useful, but just one
application. More traditional, and much more general computation model is to
have some way to locate data (old way "SQL", new way "search"), retrieve
data (old way "SQL", new way nosql KV stores), do something with data (SQL
vs map-reduce today on mega-data) and put it back to storage/deliver
outside.

What I am trying to say, the "new way" has one missing link, keeps data in
two completely separate worlds, technologically and logically apart (think
e.g. hbase and ES or cassandra and solr). This is expensive, hard to setup,
hard to keep in sync, duplicates demand on resources ...

In ideal world, imagine hbase where each node keeps embedded lucene to
expose search part with all this magic Shay is doing with ES. This would
become one infrastructure to keep all players in sync , one set of APIs to
talk to clients... It Seams riak goes this way.    

I think I see this way of thinking behind ES, so imagine ES doing
map-reduce, keeping your data safe like hbase... :)


Dreaming in public is, I guess,  OK  

Cheers,
Eks

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/ElasticSearch-vs-NoSQL-tp696971p2694954.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: ElasticSearch vs NoSQL

Eks Dev
This post has NOT been accepted by the mailing list yet.
In reply to this post by Gísli Kristjánsson
I just started playing with ES and had to comment this subject.

imo, this question in subject (discussion is great!) is plain wrong. What I would like to see somewhere is rather search and "nosql db". Keeping these two topics apart is like saying, OK let us separate DBMS from indexing and SQL. Search is great, nosqldb-s are great, but not enough.

"traditional search" is just one application, useful, but just one application. More traditional, and much more general computation model is to have some way to locate data (old way "SQL", new way "search"), retrieve data (old way "SQL", new way nosql KV stores), do something with data (SQL vs map-reduce today on mega-data) and put it back to storage/deliver outside.

What I am trying to say, the "new way" has one missing link, keeps data in two completely separate worlds, technologically and logically apart (think e.g. hbase and ES or cassandra and solr). This is expensive, hard to setup, hard to keep in sync, duplicates demand on resources ...

In ideal world, imagine hbase where each node keeps embedded lucene to expose search part with all this magic Shay is doing with ES. This would become one infrastructure to keep all players in sync , one set of APIs to talk to clients... It Seams riak goes this way.    

I think I see this way of thinking behind ES, so imagine ES doing map-reduce, keeping your data safe like hbase... :)


Dreaming in public is, I guess,  OK  

Cheers,
Eks
Reply | Threaded
Open this post in threaded view
|

Re: ElasticSearch vs NoSQL

Kosta
In reply to this post by Eks Dev
I'm glad this discussion took off, as it is something that I have been
pondering about for a while now as well.

For my latest project I started off with a large tech stack... web
framework, database, message queue, distributed file system, search
index etc. Life was great, everything was modular and decoupled and it
was all going to fit into place beautifully. In the test environment I
set up half a dozen virtual machines, each running their own component
so that I have nice isolation and can easily pinpoint bottlenecks.

Unfortunately, marvelling over this grandiose architecture was short
lived. It wasn't long before I started feeling the pain of keeping up
with the latest and greatest for each of these components. Learning
their hidden pitfalls and secrets. Then I started thinking about
scaling out, and even though all these components were elastic, cloud-
ready and <insert buzzword>, each one had a different way of sharding
and replicating. So now I had to learn how to scale, monitor,
optimize, back up and configure 4 different technologies written in
different languages and having different dependencies. My head started
to hurt, it was time for a change of plan, for a new mantra -
sometimes simpler is better!

In my particular case, I used mongo as my nosql store and I was
definitely seeing a bit of an overlap between it and ES. The type of
data was simple and I didn't have a need for map-reduce operations or
complex set relations (otherwise I wouldn't be using a nosql solution
in the first place!), I just needed a flexible data model and a fine-
grained way to search & retrieve documents, which is what elastic
search was made for in the first place. The fact that I could
partition and replicate my data using elastic search, in a way
reminiscent of mongo made the question of why even more obvious.

So I took the plunge and decided to ditch mongo for the time being and
use ES as a primary form of storage. I asked around on groups and
forums and couldn't find any glaringly obvious problem with using
lucene as a storage engine. I also looked at Terrastore briefly but
couldn't really see from the architecture diagrams what it uses for
persistence. I assume Terracotta; but based on what I read so far,
terracotta is not really well suited for permanent data but rather
throw-away data. It was interesting to see Sergio mentioning under one
of his points that "Lucene isn't intended as a storage solution; it
may or may not work for your needs, but again, that's not the intended
use (and in my own experience, it doesn't work)". I think this is
something that would be worthwhile analysing and providing real use
cases and war stories of particular situations where lucene was not a
good storage solution and where it doesn't work (and how Terrastore
addresses and solves them).

TL;DR Large tech stacks can quickly turn into administrative/learning
nightmares. Sometimes the benefits of integrating multiple solutions
into one component can far outweigh the risks and problems, especially
in a case like this where many people are confused and already see an
overlap (i.e. using ES as a nosql store).


On Mar 17, 7:29 pm, Eks Dev <[hidden email]> wrote:

> I just started playing with ES and had to comment this subject.
>
> imo, this question in subject (discussion is great!) is plain wrong. What I
> would like to see somewhere is rather search and "nosql db". Keeping these
> two topics apart is like saying, OK let us separate DBMS from indexing and
> SQL. Search is great, nosqldb-s are great, but not enough.
>
> "traditional search" is just one application, useful, but just one
> application. More traditional, and much more general computation model is to
> have some way to locate data (old way "SQL", new way "search"), retrieve
> data (old way "SQL", new way nosql KV stores), do something with data (SQL
> vs map-reduce today on mega-data) and put it back to storage/deliver
> outside.
>
> What I am trying to say, the "new way" has one missing link, keeps data in
> two completely separate worlds, technologically and logically apart (think
> e.g. hbase and ES or cassandra and solr). This is expensive, hard to setup,
> hard to keep in sync, duplicates demand on resources ...
>
> In ideal world, imagine hbase where each node keeps embedded lucene to
> expose search part with all this magic Shay is doing with ES. This would
> become one infrastructure to keep all players in sync , one set of APIs to
> talk to clients... It Seams riak goes this way.    
>
> I think I see this way of thinking behind ES, so imagine ES doing
> map-reduce, keeping your data safe like hbase... :)
>
> Dreaming in public is, I guess,  OK  
>
> Cheers,
> Eks
>
> --
> View this message in context:http://elasticsearch-users.115913.n3.nabble.com/ElasticSearch-vs-NoSQ...
> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: ElasticSearch vs NoSQL

Kosta
Just realized that Eks replied to a year old thread... Sorry for joining the thread resurrection like this, but I guess that makes it still somewhat relevant a year later :)
12