Quantcast

elasticsearch vs solr : indexing speed

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

elasticsearch vs solr : indexing speed

massi
Hi Guys,

What do you think of this article:
http://dmurphy747.wordpress.com/2011/04/02/solr-vs-elasticsearch-deathmatch/
where elasticsearch and solr are compared with regard to the indexing
speed?

A quote from the article: "I ran each test 4 times, killing the JVM
and removing the data directory for both Solr and elasticsearch. The
final averaged results expressed as throughputs were 43204 docs/sec
for Solr, 44052 docs/sec for Solr direct streaming, and 9823 docs/sec
for elasticsearch."

PS: Don't take me wrong, I know that it is only one (partial) test,
and that some features in elasticsearch make it unique!

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: elasticsearch vs solr : indexing speed

Clinton Gormley
Hiya

> What do you think of this article:
> http://dmurphy747.wordpress.com/2011/04/02/solr-vs-elasticsearch-deathmatch/
> where elasticsearch and solr are compared with regard to the indexing
> speed?

I've posted a reply (currently awaiting moderation) but his benchmark is
severely flawed.  eg, he wasn't actually indexing what he thought he was
indexing.

With a few simple changes, I got much better performance out of ES than
he was getting.

On a side note, it seems refresh_interval is not being respected in
0.15.2, which would also decrease raw indexing speed

clint

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: elasticsearch vs solr : indexing speed

K.B.
In reply to this post by massi
if you look at:

{"add":{"doc":{ "id":"1582039702", "field1_s":"1184645701" }} in case
of SOLR compared to
{"index": {"_index":"test", "_type":"type1", "_id":"1582039702",
"field1":"1184645701" }} for ES

he can't be serious; it's also not sure how the fields were treated
and configurated as no config options were stated.

From my own ES usage I know ES can index 1500 doc's containing each 45
fields (some very long language ones with up to 10 000 chars) in under
0.6 seconds; So if I just think about 2 fields here and take 1500 * 45
fields at 0.6 secs, I would expect that ES can take at least about 57
000 of his 2 field demo's without any problems;


On 17 Apr., 03:56, massi <[hidden email]> wrote:

> Hi Guys,
>
> What do you think of this article:http://dmurphy747.wordpress.com/2011/04/02/solr-vs-elasticsearch-deat...
> where elasticsearch and solr are compared with regard to the indexing
> speed?
>
> A quote from the article: "I ran each test 4 times, killing the JVM
> and removing the data directory for both Solr and elasticsearch. The
> final averaged results expressed as throughputs were 43204 docs/sec
> for Solr, 44052 docs/sec for Solr direct streaming, and 9823 docs/sec
> for elasticsearch."
>
> PS: Don't take me wrong, I know that it is only one (partial) test,
> and that some features in elasticsearch make it unique!
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: elasticsearch vs solr : indexing speed

Otis Gospodnetic
In reply to this post by massi
Hi,

I wouldn't pay much attention to that post/benchmark.  A good
benchmark needs to publish a lot more details than the above, starting
with basic stuff like -Xmx.  I'm also of the opinion that if you are
going to publish a benchmark comparing 2 pieces of software then you
better invite experts from both sides and let them tune and optimize
things.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



On Apr 16, 9:56 pm, massi <[hidden email]> wrote:

> Hi Guys,
>
> What do you think of this article:http://dmurphy747.wordpress.com/2011/04/02/solr-vs-elasticsearch-deat...
> where elasticsearch and solr are compared with regard to the indexing
> speed?
>
> A quote from the article: "I ran each test 4 times, killing the JVM
> and removing the data directory for both Solr and elasticsearch. The
> final averaged results expressed as throughputs were 43204 docs/sec
> for Solr, 44052 docs/sec for Solr direct streaming, and 9823 docs/sec
> for elasticsearch."
>
> PS: Don't take me wrong, I know that it is only one (partial) test,
> and that some features in elasticsearch make it unique!
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: elasticsearch vs solr : indexing speed

kimchy
Administrator
Heya,

  Here is clinton answer: https://gist.github.com/0382ed3913f0c3e40d62, and I'd like to add to that:

1. In order to completely compare the two in terms of overhead when indexing, at least for this very simple doc, the _source and _all field needs to be disabled.
2. The type used for Solr field1 is, when used in ES, of index set to not_analyzed, and omit_norms set to true. It should be the same for ES.
3. Again, ES will index two more additional fields, _id and _type. To really compare, they should be set to index to no. When doing so, the only thing one looses is the ability to query them on search time (this is in master).

  I posted a sample as a comment on clinton post.

   Some more aspects to how ES works differently than Solr:

1. When indexing data its there. If you "kill -9" ES (even with a single server), and start it back up, all data indexing up until that point will be there with local gateway (this is not done through committing Lucene on each change, as this will not scale). Solr, on the other hand, will loose all changes until the last commit. This does come with a (small) overhead.
2. The bulk API format for elasticsearch is more optimized for distributed execution, where it needs to be sliced and diced in order to point the bulk items to the correct shards. This does come with an overhead compared to a single big json that is parsed and processed in a single shard scenario, while proves very crucial when working with several shards.

-shay.banon

On Monday, April 18, 2011 at 5:56 AM, Otis wrote:

Hi,

I wouldn't pay much attention to that post/benchmark. A good
benchmark needs to publish a lot more details than the above, starting
with basic stuff like -Xmx. I'm also of the opinion that if you are
going to publish a benchmark comparing 2 pieces of software then you
better invite experts from both sides and let them tune and optimize
things.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



On Apr 16, 9:56 pm, massi <mehdi.a...@gmail.com> wrote:
Hi Guys,

What do you think of this article:http://dmurphy747.wordpress.com/2011/04/02/solr-vs-elasticsearch-deat...
where elasticsearch and solr are compared with regard to the indexing
speed?

A quote from the article: "I ran each test 4 times, killing the JVM
and removing the data directory for both Solr and elasticsearch. The
final averaged results expressed as throughputs were 43204 docs/sec
for Solr, 44052 docs/sec for Solr direct streaming, and 9823 docs/sec
for elasticsearch."

PS: Don't take me wrong, I know that it is only one (partial) test,
and that some features in elasticsearch make it unique!

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: elasticsearch vs solr : indexing speed

medcl.net
great~
 
Sent: Monday, April 18, 2011 12:09 PM
Subject: Re: elasticsearch vs solr : indexing speed
 
Heya,

  Here is clinton answer: https://gist.github.com/0382ed3913f0c3e40d62, and I'd like to add to that:
 
1. In order to completely compare the two in terms of overhead when indexing, at least for this very simple doc, the _source and _all field needs to be disabled.
2. The type used for Solr field1 is, when used in ES, of index set to not_analyzed, and omit_norms set to true. It should be the same for ES.
3. Again, ES will index two more additional fields, _id and _type. To really compare, they should be set to index to no. When doing so, the only thing one looses is the ability to query them on search time (this is in master).
 
  I posted a sample as a comment on clinton post.
 
   Some more aspects to how ES works differently than Solr:
 
1. When indexing data its there. If you "kill -9" ES (even with a single server), and start it back up, all data indexing up until that point will be there with local gateway (this is not done through committing Lucene on each change, as this will not scale). Solr, on the other hand, will loose all changes until the last commit. This does come with a (small) overhead.
2. The bulk API format for elasticsearch is more optimized for distributed execution, where it needs to be sliced and diced in order to point the bulk items to the correct shards. This does come with an overhead compared to a single big json that is parsed and processed in a single shard scenario, while proves very crucial when working with several shards.
 
-shay.banon

On Monday, April 18, 2011 at 5:56 AM, Otis wrote:

Hi,

I wouldn't pay much attention to that post/benchmark. A good
benchmark needs to publish a lot more details than the above, starting
with basic stuff like -Xmx. I'm also of the opinion that if you are
going to publish a benchmark comparing 2 pieces of software then you
better invite experts from both sides and let them tune and optimize
things.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



On Apr 16, 9:56 pm, massi <mehdi.a...@gmail.com> wrote:
Hi Guys,

What do you think of this article:http://dmurphy747.wordpress.com/2011/04/02/solr-vs-elasticsearch-deat...
where elasticsearch and solr are compared with regard to the indexing
speed?

A quote from the article: "I ran each test 4 times, killing the JVM
and removing the data directory for both Solr and elasticsearch. The
final averaged results expressed as throughputs were 43204 docs/sec
for Solr, 44052 docs/sec for Solr direct streaming, and 9823 docs/sec
for elasticsearch."

PS: Don't take me wrong, I know that it is only one (partial) test,
and that some features in elasticsearch make it unique!
 


---------------------
Medcl
http://log.medcl.net
Loading...