|
Hi Guys,
What do you think of this article: http://dmurphy747.wordpress.com/2011/04/02/solr-vs-elasticsearch-deathmatch/ where elasticsearch and solr are compared with regard to the indexing speed? A quote from the article: "I ran each test 4 times, killing the JVM and removing the data directory for both Solr and elasticsearch. The final averaged results expressed as throughputs were 43204 docs/sec for Solr, 44052 docs/sec for Solr direct streaming, and 9823 docs/sec for elasticsearch." PS: Don't take me wrong, I know that it is only one (partial) test, and that some features in elasticsearch make it unique! |
|
Hiya
> What do you think of this article: > http://dmurphy747.wordpress.com/2011/04/02/solr-vs-elasticsearch-deathmatch/ > where elasticsearch and solr are compared with regard to the indexing > speed? I've posted a reply (currently awaiting moderation) but his benchmark is severely flawed. eg, he wasn't actually indexing what he thought he was indexing. With a few simple changes, I got much better performance out of ES than he was getting. On a side note, it seems refresh_interval is not being respected in 0.15.2, which would also decrease raw indexing speed clint |
|
In reply to this post by massi
if you look at:
{"add":{"doc":{ "id":"1582039702", "field1_s":"1184645701" }} in case of SOLR compared to {"index": {"_index":"test", "_type":"type1", "_id":"1582039702", "field1":"1184645701" }} for ES he can't be serious; it's also not sure how the fields were treated and configurated as no config options were stated. From my own ES usage I know ES can index 1500 doc's containing each 45 fields (some very long language ones with up to 10 000 chars) in under 0.6 seconds; So if I just think about 2 fields here and take 1500 * 45 fields at 0.6 secs, I would expect that ES can take at least about 57 000 of his 2 field demo's without any problems; On 17 Apr., 03:56, massi <[hidden email]> wrote: > Hi Guys, > > What do you think of this article:http://dmurphy747.wordpress.com/2011/04/02/solr-vs-elasticsearch-deat... > where elasticsearch and solr are compared with regard to the indexing > speed? > > A quote from the article: "I ran each test 4 times, killing the JVM > and removing the data directory for both Solr and elasticsearch. The > final averaged results expressed as throughputs were 43204 docs/sec > for Solr, 44052 docs/sec for Solr direct streaming, and 9823 docs/sec > for elasticsearch." > > PS: Don't take me wrong, I know that it is only one (partial) test, > and that some features in elasticsearch make it unique! |
|
In reply to this post by massi
Hi,
I wouldn't pay much attention to that post/benchmark. A good benchmark needs to publish a lot more details than the above, starting with basic stuff like -Xmx. I'm also of the opinion that if you are going to publish a benchmark comparing 2 pieces of software then you better invite experts from both sides and let them tune and optimize things. Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ On Apr 16, 9:56 pm, massi <[hidden email]> wrote: > Hi Guys, > > What do you think of this article:http://dmurphy747.wordpress.com/2011/04/02/solr-vs-elasticsearch-deat... > where elasticsearch and solr are compared with regard to the indexing > speed? > > A quote from the article: "I ran each test 4 times, killing the JVM > and removing the data directory for both Solr and elasticsearch. The > final averaged results expressed as throughputs were 43204 docs/sec > for Solr, 44052 docs/sec for Solr direct streaming, and 9823 docs/sec > for elasticsearch." > > PS: Don't take me wrong, I know that it is only one (partial) test, > and that some features in elasticsearch make it unique! |
|
Administrator
|
Heya, Here is clinton answer: https://gist.github.com/0382ed3913f0c3e40d62, and I'd like to add to that: 1. In order to completely compare the two in terms of overhead when indexing, at least for this very simple doc, the _source and _all field needs to be disabled. 2. The type used for Solr field1 is, when used in ES, of index set to not_analyzed, and omit_norms set to true. It should be the same for ES. 3. Again, ES will index two more additional fields, _id and _type. To really compare, they should be set to index to no. When doing so, the only thing one looses is the ability to query them on search time (this is in master). I posted a sample as a comment on clinton post. Some more aspects to how ES works differently than Solr: 1. When indexing data its there. If you "kill -9" ES (even with a single server), and start it back up, all data indexing up until that point will be there with local gateway (this is not done through committing Lucene on each change, as this will not scale). Solr, on the other hand, will loose all changes until the last commit. This does come with a (small) overhead. 2. The bulk API format for elasticsearch is more optimized for distributed execution, where it needs to be sliced and diced in order to point the bulk items to the correct shards. This does come with an overhead compared to a single big json that is parsed and processed in a single shard scenario, while proves very crucial when working with several shards. -shay.banon On Monday, April 18, 2011 at 5:56 AM, Otis wrote:
|
|
great~
From: [hidden email]
Sent: Monday, April 18, 2011 12:09 PM
To: [hidden email]
Subject: Re: elasticsearch vs solr : indexing
speed Heya,
Here is clinton answer: https://gist.github.com/0382ed3913f0c3e40d62,
and I'd like to add to that:
1. In order to completely compare the two in terms of overhead when
indexing, at least for this very simple doc, the _source and _all field needs to
be disabled.
2. The type used for Solr field1 is, when used in ES, of index set to
not_analyzed, and omit_norms set to true. It should be the same for ES.
3. Again, ES will index two more additional fields, _id and _type. To
really compare, they should be set to index to no. When doing so, the only thing
one looses is the ability to query them on search time (this is in
master).
I posted a sample as a comment on clinton post.
Some more aspects to how ES works differently than Solr:
1. When indexing data its there. If you "kill -9" ES (even with a single
server), and start it back up, all data indexing up until that point will be
there with local gateway (this is not done through committing Lucene on each
change, as this will not scale). Solr, on the other hand, will loose all changes
until the last commit. This does come with a (small) overhead.
2. The bulk API format for elasticsearch is more optimized for distributed
execution, where it needs to be sliced and diced in order to point the bulk
items to the correct shards. This does come with an overhead compared to a
single big json that is parsed and processed in a single shard scenario, while
proves very crucial when working with several shards.
-shay.banon
On Monday, April 18, 2011 at 5:56 AM, Otis wrote:
|
| Powered by Nabble | Edit this page |
