ES performance questions

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

ES performance questions

Chris Scribner
Hi,

Some co-workers and myself have been testing out ES recently. I wanted
to ask about some observations we've made.

Test server: 2x core i5, 4 GB RAM, 5400 RPM hard disk, Mac OS X Lion

We attempted to insert 2 million documents with a name, description,
and score. The index size was about 3 gigs.

Doing a search with a query on description for two common words
(common words are in about 10% of documents) takes about 6-7 seconds
to return 500 results, ordered by score. It takes about 2 seconds to
return the top 50 results.

Given the specs of the computer running the search, this doesn't seem
terrible. But, when running a search, we notice that ElasticSearch is
using just a few % CPU time, and less than 150 MB of RAM, even though
much more is available.

That behavior makes me think the query latency is mainly time to read
from the hard disk. But, I'm curious why ES isn't trying to use more
RAM to make the query faster.

I'm wondering if this all sounds normal, and whether there's anything
we can do to optimize this particular type of search. We changed the
index mapping to store the name, description, and score. In this case,
we don't care about the total number of matches found, if that makes a
difference.

Thanks,

Chris
Reply | Threaded
Open this post in threaded view
|

Re: ES performance questions

Patrick-4
Hey Chris,

What do your java command line options look like? Also, did you check to see how much I/O it was doing at the time? Did you vary the search, or were you searching for similar/the same terms?

Patrick
----------------------------------------
http://about.me/patrick.ancillotti
patrick <at> eefy <dot> net



On Mon, Dec 12, 2011 at 8:55 PM, Chris Scribner <[hidden email]> wrote:
Hi,

Some co-workers and myself have been testing out ES recently. I wanted
to ask about some observations we've made.

Test server: 2x core i5, 4 GB RAM, 5400 RPM hard disk, Mac OS X Lion

We attempted to insert 2 million documents with a name, description,
and score. The index size was about 3 gigs.

Doing a search with a query on description for two common words
(common words are in about 10% of documents) takes about 6-7 seconds
to return 500 results, ordered by score. It takes about 2 seconds to
return the top 50 results.

Given the specs of the computer running the search, this doesn't seem
terrible. But, when running a search, we notice that ElasticSearch is
using just a few % CPU time, and less than 150 MB of RAM, even though
much more is available.

That behavior makes me think the query latency is mainly time to read
from the hard disk. But, I'm curious why ES isn't trying to use more
RAM to make the query faster.

I'm wondering if this all sounds normal, and whether there's anything
we can do to optimize this particular type of search. We changed the
index mapping to store the name, description, and score. In this case,
we don't care about the total number of matches found, if that makes a
difference.

Thanks,

Chris

Reply | Threaded
Open this post in threaded view
|

Re: ES performance questions

Chris Scribner
We tried passing -Xmx2g. We didn't check I/O stats on the process.

We varied the search terms each query. Searching for the same search
terms is very quick as expected (but not a particularly important use
case for us).

Chris

On Dec 12, 9:34 pm, Patrick <[hidden email]> wrote:

> Hey Chris,
>
> What do your java command line options look like? Also, did you check to
> see how much I/O it was doing at the time? Did you vary the search, or were
> you searching for similar/the same terms?
>
> Patrick
> ----------------------------------------http://about.me/patrick.ancillotti
> patrick <at> eefy <dot> net
>
>
>
>
>
>
>
> On Mon, Dec 12, 2011 at 8:55 PM, Chris Scribner <[hidden email]> wrote:
> > Hi,
>
> > Some co-workers and myself have been testing out ES recently. I wanted
> > to ask about some observations we've made.
>
> > Test server: 2x core i5, 4 GB RAM, 5400 RPM hard disk, Mac OS X Lion
>
> > We attempted to insert 2 million documents with a name, description,
> > and score. The index size was about 3 gigs.
>
> > Doing a search with a query on description for two common words
> > (common words are in about 10% of documents) takes about 6-7 seconds
> > to return 500 results, ordered by score. It takes about 2 seconds to
> > return the top 50 results.
>
> > Given the specs of the computer running the search, this doesn't seem
> > terrible. But, when running a search, we notice that ElasticSearch is
> > using just a few % CPU time, and less than 150 MB of RAM, even though
> > much more is available.
>
> > That behavior makes me think the query latency is mainly time to read
> > from the hard disk. But, I'm curious why ES isn't trying to use more
> > RAM to make the query faster.
>
> > I'm wondering if this all sounds normal, and whether there's anything
> > we can do to optimize this particular type of search. We changed the
> > index mapping to store the name, description, and score. In this case,
> > we don't care about the total number of matches found, if that makes a
> > difference.
>
> > Thanks,
>
> > Chris
Reply | Threaded
Open this post in threaded view
|

Re: ES performance questions

Paul Loy
Did you let caches warm up before doing timings? As with all perf tests, you should always let both the JVM (hotspot) warm up as well as all the ES good stuff.

Something else you can do would be to add more boxes to distribute the queries. If disk I/O is your bottleneck, sharing this load will speed things up. Especially if your data is ~3gb but your memory is only 2gb, maybe not all your data is fitting in memory. If every query is requiring disk access then you need more RAM - either on that one machine or by adding more machines.

Paul.

On Mon, Dec 12, 2011 at 6:53 PM, Chris Scribner <[hidden email]> wrote:
We tried passing -Xmx2g. We didn't check I/O stats on the process.

We varied the search terms each query. Searching for the same search
terms is very quick as expected (but not a particularly important use
case for us).

Chris

On Dec 12, 9:34 pm, Patrick <[hidden email]> wrote:
> Hey Chris,
>
> What do your java command line options look like? Also, did you check to
> see how much I/O it was doing at the time? Did you vary the search, or were
> you searching for similar/the same terms?
>
> Patrick
> ----------------------------------------http://about.me/patrick.ancillotti
> patrick <at> eefy <dot> net
>
>
>
>
>
>
>
> On Mon, Dec 12, 2011 at 8:55 PM, Chris Scribner <[hidden email]> wrote:
> > Hi,
>
> > Some co-workers and myself have been testing out ES recently. I wanted
> > to ask about some observations we've made.
>
> > Test server: 2x core i5, 4 GB RAM, 5400 RPM hard disk, Mac OS X Lion
>
> > We attempted to insert 2 million documents with a name, description,
> > and score. The index size was about 3 gigs.
>
> > Doing a search with a query on description for two common words
> > (common words are in about 10% of documents) takes about 6-7 seconds
> > to return 500 results, ordered by score. It takes about 2 seconds to
> > return the top 50 results.
>
> > Given the specs of the computer running the search, this doesn't seem
> > terrible. But, when running a search, we notice that ElasticSearch is
> > using just a few % CPU time, and less than 150 MB of RAM, even though
> > much more is available.
>
> > That behavior makes me think the query latency is mainly time to read
> > from the hard disk. But, I'm curious why ES isn't trying to use more
> > RAM to make the query faster.
>
> > I'm wondering if this all sounds normal, and whether there's anything
> > we can do to optimize this particular type of search. We changed the
> > index mapping to store the name, description, and score. In this case,
> > we don't care about the total number of matches found, if that makes a
> > difference.
>
> > Thanks,
>
> > Chris



--
---------------------------------------------
Paul Loy
[hidden email]
http://uk.linkedin.com/in/paulloy
Reply | Threaded
Open this post in threaded view
|

Re: ES performance questions

Karussell
In reply to this post by Chris Scribner
> The index size was about 3 gigs.

Hmmh, that should not be too much for the server...

Some questions ;)

Did you verified that ES is really using that 2gig? Are you starting
ES via the elasticsearch scipt, then there would be a variable for
this. How many shards do you have? Have you changed any other settings
or using different index settings? Is it a simple term query? Which
java version are you using and did you use "-server"?

Peter.
Reply | Threaded
Open this post in threaded view
|

Re: ES performance questions

Chris Scribner
We are starting elasticsearch via the shell script, and setting memory
variables (ES_MIN_MEM, ES_MAX_MEM).
Shards: Whatever the default is (1, I think)
Index: We haven't changed any other settings. We ran the tests again
today with the automatically created index, and saw similar
performance.
Query:
{
        fields: ["goodness"],
        sort: [{goodness: {order: "desc"}}],
        query: {
            text: {
                description: {
                    query: "randomWord1 randomWord2",
                    operator: "and"
                }
            }
        },
        size: 50
}

Java version: 1.6.0_29. We did not use "-server"

As far as we can tell, ES is not using all the memory allocated. On
the same machine we tested on yesterday, it gets up to a max of about
200-250 MB. Watching the disk I/O, it's only reading about 700 KB/s
from the HD while the queries are running. We tested on a different
machine running Ubuntu, and it got up to about 350 MB. We verified
that it was respecting the memory settings, because when it builds the
index it used up all the RAM allocated.

We built a test script that runs searches repeatedly (100 - 1000 at a
time). Running 100 searches (in parallel) (using the query above)
takes between 7-20 seconds, depending on how common the words are we
search with. (As expected, the search runs more slowly at first before
the cache is primed. The previous numbers are where it levels out to
after a few runs)
Reply | Threaded
Open this post in threaded view
|

Re: ES performance questions

kimchy
Administrator
Regarding the amount of memory, elasticsearch (and Lucene) will use the memory it needs, not more if there is more available. At least when search is executed. Hard to tell with the machine you have if its really slow or not, if you want to zip your data directory and dropbox it, I can run the query and check here.

On Tue, Dec 13, 2011 at 9:48 PM, Chris Scribner <[hidden email]> wrote:
We are starting elasticsearch via the shell script, and setting memory
variables (ES_MIN_MEM, ES_MAX_MEM).
Shards: Whatever the default is (1, I think)
Index: We haven't changed any other settings. We ran the tests again
today with the automatically created index, and saw similar
performance.
Query:
{
        fields: ["goodness"],
        sort: [{goodness: {order: "desc"}}],
        query: {
            text: {
                description: {
                    query: "randomWord1 randomWord2",
                    operator: "and"
                }
            }
        },
        size: 50
}

Java version: 1.6.0_29. We did not use "-server"

As far as we can tell, ES is not using all the memory allocated. On
the same machine we tested on yesterday, it gets up to a max of about
200-250 MB. Watching the disk I/O, it's only reading about 700 KB/s
from the HD while the queries are running. We tested on a different
machine running Ubuntu, and it got up to about 350 MB. We verified
that it was respecting the memory settings, because when it builds the
index it used up all the RAM allocated.

We built a test script that runs searches repeatedly (100 - 1000 at a
time). Running 100 searches (in parallel) (using the query above)
takes between 7-20 seconds, depending on how common the words are we
search with. (As expected, the search runs more slowly at first before
the cache is primed. The previous numbers are where it levels out to
after a few runs)

Reply | Threaded
Open this post in threaded view
|

Re: ES performance questions

Chris Scribner
Shay,

Thanks for the response. We've since done some testing on "production
grade" boxes and the performance is quite acceptable.

The guess we made from this data is that Lucene doesn't attempt to
cache the underlying documents in memory -- it just stores the index
in memory. Does that sound about right?

Thanks,

Chris
Reply | Threaded
Open this post in threaded view
|

Re: ES performance questions

kimchy
Administrator
It loads part of the terms of the inverted index in memory. You can control that part using the term index interval and divisor: http://www.elasticsearch.org/guide/reference/index-modules/.

On Fri, Dec 16, 2011 at 7:01 PM, Chris Scribner <[hidden email]> wrote:
Shay,

Thanks for the response. We've since done some testing on "production
grade" boxes and the performance is quite acceptable.

The guess we made from this data is that Lucene doesn't attempt to
cache the underlying documents in memory -- it just stores the index
in memory. Does that sound about right?

Thanks,

Chris