Solr vs ES: performance

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr vs ES: performance

mfeingold
In my quest for a search platform for our internal needs I am playing
with both Solr and ES. I indexed the same data ~4.5M documents with
the same structure of indexes and run the same query on both.

What surprises me is that for whatever reason ES search is 1.5 times
slower than Solr. Also the Solr index size is 650MB vs 2.1G for ES.

Can you help me figure out what am I missing here?

ES configuration: In my quest for a search platform for our internal
needs I am playing with both Solr and ES. I indexed the same data
~4.5M documents with the same structure of indexes and run the same
query on both.

What surprises me is that for whatever reason ES search is more than 3
times slower than Solr. Also the Solr index size is 650MB vs 2.1G for
ES.

Can you help me figure out what am I missing here?

ES configuration: https://gist.github.com/1118703

Solr Schema: https://gist.github.com/1118711

Query: https://gist.github.com/1118864
Reply | Threaded
Open this post in threaded view
|

Re: Solr vs ES: performance

Paul Loy
One immediate difference would be that the default number of shards in ES is 5 and the default number of replicas is 1 (i.e. a master and one copy).

The replication factor will mean 2x the storage. Also, I think you get a local gateway out of the box, so that gives you another copy of all the shards and replicas making for 4x the actual index size. 2.1Gb is pretty much 4x 650Mb and so is expected.

Given 5 shards, a query will take longer as it has to map/reduce across the 5 shards.

So for a single-node, single-shard, like-for-like test you should set shards to 1 and replicas to 0. Then they'll be comparable. But then, of course, you have negated the reason why you would choose ES in the first place which is to increase write-throughput and to make your index scalable and much more available. :)

Cheers,

On Mon, Aug 1, 2011 at 9:05 PM, Michael Feingold <[hidden email]> wrote:
In my quest for a search platform for our internal needs I am playing
with both Solr and ES. I indexed the same data ~4.5M documents with
the same structure of indexes and run the same query on both.

What surprises me is that for whatever reason ES search is 1.5 times
slower than Solr. Also the Solr index size is 650MB vs 2.1G for ES.

Can you help me figure out what am I missing here?

ES configuration: In my quest for a search platform for our internal
needs I am playing with both Solr and ES. I indexed the same data
~4.5M documents with the same structure of indexes and run the same
query on both.

What surprises me is that for whatever reason ES search is more than 3
times slower than Solr. Also the Solr index size is 650MB vs 2.1G for
ES.

Can you help me figure out what am I missing here?

ES configuration: https://gist.github.com/1118703

Solr Schema: https://gist.github.com/1118711

Query: https://gist.github.com/1118864



--
---------------------------------------------
Paul Loy
[hidden email]
http://uk.linkedin.com/in/paulloy
Reply | Threaded
Open this post in threaded view
|

Re: Solr vs ES: performance

kimchy
Administrator
In reply to this post by mfeingold
Few notes:

* elasticsearch stores the json document itself in the index, which can explain the size difference. Also, I am not sure what solr configuration is when it comes to merging.

* By default, elasticsearch creates an _all field that aggregates all the fields as a single searchable field. This might explain the size difference (and possibly slower indexing rate). You can easily disable it: http://www.elasticsearch.org/guide/reference/mapping/all-field.html.

* By default, when you create an index in elasticsearch, its already has 5 shards, so a search executes across those 5 shards (which might also explain why the index size is bigger). If you wan to compare it to a single Solr instance, then either create an index with a single shard in elasticsearch, or start a 5 core solr server and do distributed search across them (this is where a you will see a big difference, I suspect, as solr does distributed search not as well as elasticsearch).

* Checking a single same query perf is problematic. elasticsearch, by design, does not have caches that solr has where they heavily come into play with same query perf test. The reason for that is that the overhead of those caches (doc cache, query cache) when it comes to actual varied usage is problematic when it comes to garbage collection in the JVM and concurrency.

On Mon, Aug 1, 2011 at 11:05 PM, Michael Feingold <[hidden email]> wrote:
In my quest for a search platform for our internal needs I am playing
with both Solr and ES. I indexed the same data ~4.5M documents with
the same structure of indexes and run the same query on both.

What surprises me is that for whatever reason ES search is 1.5 times
slower than Solr. Also the Solr index size is 650MB vs 2.1G for ES.

Can you help me figure out what am I missing here?

ES configuration: In my quest for a search platform for our internal
needs I am playing with both Solr and ES. I indexed the same data
~4.5M documents with the same structure of indexes and run the same
query on both.

What surprises me is that for whatever reason ES search is more than 3
times slower than Solr. Also the Solr index size is 650MB vs 2.1G for
ES.

Can you help me figure out what am I missing here?

ES configuration: https://gist.github.com/1118703

Solr Schema: https://gist.github.com/1118711

Query: https://gist.github.com/1118864

Reply | Threaded
Open this post in threaded view
|

Re: Solr vs ES: performance

kimchy
Administrator
In reply to this post by Paul Loy
Ha!, Paul already answered some of it :), lemme just correct some things Paul :)

On Mon, Aug 1, 2011 at 11:33 PM, Paul Loy <[hidden email]> wrote:
One immediate difference would be that the default number of shards in ES is 5 and the default number of replicas is 1 (i.e. a master and one copy).

The replication factor will mean 2x the storage.

Thats only affects things if you have more than 1 node, as elasticsearch won't allocate a shard and a replica on the same node.
 
Also, I think you get a local gateway out of the box, so that gives you another copy of all the shards and replicas making for 4x the actual index size. 2.1Gb is pretty much 4x 650Mb and so is expected.

The local gateway does *not* create another copy of the data, it uses the same data directory that your indices reside on.
 

Given 5 shards, a query will take longer as it has to map/reduce across the 5 shards.

So for a single-node, single-shard, like-for-like test you should set shards to 1 and replicas to 0. Then they'll be comparable. But then, of course, you have negated the reason why you would choose ES in the first place which is to increase write-throughput and to make your index scalable and much more available. :)

Cheers,


On Mon, Aug 1, 2011 at 9:05 PM, Michael Feingold <[hidden email]> wrote:
In my quest for a search platform for our internal needs I am playing
with both Solr and ES. I indexed the same data ~4.5M documents with
the same structure of indexes and run the same query on both.

What surprises me is that for whatever reason ES search is 1.5 times
slower than Solr. Also the Solr index size is 650MB vs 2.1G for ES.

Can you help me figure out what am I missing here?

ES configuration: In my quest for a search platform for our internal
needs I am playing with both Solr and ES. I indexed the same data
~4.5M documents with the same structure of indexes and run the same
query on both.

What surprises me is that for whatever reason ES search is more than 3
times slower than Solr. Also the Solr index size is 650MB vs 2.1G for
ES.

Can you help me figure out what am I missing here?

ES configuration: https://gist.github.com/1118703

Solr Schema: https://gist.github.com/1118711

Query: https://gist.github.com/1118864



--
---------------------------------------------
Paul Loy
[hidden email]
http://uk.linkedin.com/in/paulloy