High cpu usage on large ec2 nodes

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

High cpu usage on large ec2 nodes

rohit reddy
Hi,

I'm pretty new to elasticsearch, though i have extensively used lucene.
We are currently migrating from lucene to elasticsearch in our project.

We create a basic elasticsearch setup on AWS cloud and are trying to test the performance of the same.

The configuration:
EC2 Nodes - 2 Large nodes
Shards - 5
Replication - 1
Memory settings - 4GB

We have created a basic index whose size is about 7GB. For the performance tests, we have pretty much maintained a constant index, ie., the index is not getting updated. There are no index events to the elasticsearch server.

Not we are bombarding each elasticsearch node with about 100 search requets per sec (using a single jmeter client for this). Each search query is a boolean query with 5-6 term query criteria.

For this load the CPU utilization is going upto 75%. The performance of each query is still good. One query took about 90ms to return the result.

We then reduced the shards to 3 and ran the same tests. 
The CPU usage remained the same but the performance degraded. Now each request took about 180ms to return the result.

We expected the results to improve since we reduced the number of shards. Not the opposite happened. Is this the expected result.
And is the high CPU usage also expected?

Thanks
Rohit 

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: High cpu usage on large ec2 nodes

Karel Minařík
In general, yes, decreasing the number of shards should improve search performance (less Lucene indices to search against), but I suspect in your benchmarking scenario, there are many variables and it's hard to keep them consistent:

* The m1.large instance type is quite small, in a sense it has lot of "neighbours" -- you never know who is doing what in the same rack
* The m2.xlarge is better in this sense, and also allows you to use the high I/O EBS volumes
* A *lot* depends on the disk used for ES -- are you using the EBS-backed instance disk? The "physical" ephemeral disk for the instance? Extra EBS volume, possibly IOPS?
* Regarding the CPU, I'd say it's expected you'll saturate the resources of the machine at one point, and ~100 req/sec sounds kinda OK to me for the type of machine in question. You can use the `hot_threads`  API to check where the time is spent: http://www.elasticsearch.org/guide/reference/api/admin-cluster-nodes-hot-threads.html

Karel

On Monday, January 28, 2013 6:52:19 PM UTC+1, rohit reddy wrote:
Hi,

I'm pretty new to elasticsearch, though i have extensively used lucene.
We are currently migrating from lucene to elasticsearch in our project.

We create a basic elasticsearch setup on AWS cloud and are trying to test the performance of the same.

The configuration:
EC2 Nodes - 2 Large nodes
Shards - 5
Replication - 1
Memory settings - 4GB

We have created a basic index whose size is about 7GB. For the performance tests, we have pretty much maintained a constant index, ie., the index is not getting updated. There are no index events to the elasticsearch server.

Not we are bombarding each elasticsearch node with about 100 search requets per sec (using a single jmeter client for this). Each search query is a boolean query with 5-6 term query criteria.

For this load the CPU utilization is going upto 75%. The performance of each query is still good. One query took about 90ms to return the result.

We then reduced the shards to 3 and ran the same tests. 
The CPU usage remained the same but the performance degraded. Now each request took about 180ms to return the result.

We expected the results to improve since we reduced the number of shards. Not the opposite happened. Is this the expected result.
And is the high CPU usage also expected?

Thanks
Rohit 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: High cpu usage on large ec2 nodes

rohit reddy
We are using ephemeral disk with s3 backup. Since we expect the performance of ephemeral disk to be better than EBS. And since our index does not get updated too frequently, the overhead of storing backups in S3 is not huge.

I'll see use the API and try to identify which resource is taking up the CPU. 

Thanks
Rohit

On Tuesday, January 29, 2013 1:12:34 PM UTC+5:30, Karel Minařík wrote:
In general, yes, decreasing the number of shards should improve search performance (less Lucene indices to search against), but I suspect in your benchmarking scenario, there are many variables and it's hard to keep them consistent:

* The m1.large instance type is quite small, in a sense it has lot of "neighbours" -- you never know who is doing what in the same rack
* The m2.xlarge is better in this sense, and also allows you to use the high I/O EBS volumes
* A *lot* depends on the disk used for ES -- are you using the EBS-backed instance disk? The "physical" ephemeral disk for the instance? Extra EBS volume, possibly IOPS?
* Regarding the CPU, I'd say it's expected you'll saturate the resources of the machine at one point, and ~100 req/sec sounds kinda OK to me for the type of machine in question. You can use the `hot_threads`  API to check where the time is spent: http://www.elasticsearch.org/guide/reference/api/admin-cluster-nodes-hot-threads.html

Karel

On Monday, January 28, 2013 6:52:19 PM UTC+1, rohit reddy wrote:
Hi,

I'm pretty new to elasticsearch, though i have extensively used lucene.
We are currently migrating from lucene to elasticsearch in our project.

We create a basic elasticsearch setup on AWS cloud and are trying to test the performance of the same.

The configuration:
EC2 Nodes - 2 Large nodes
Shards - 5
Replication - 1
Memory settings - 4GB

We have created a basic index whose size is about 7GB. For the performance tests, we have pretty much maintained a constant index, ie., the index is not getting updated. There are no index events to the elasticsearch server.

Not we are bombarding each elasticsearch node with about 100 search requets per sec (using a single jmeter client for this). Each search query is a boolean query with 5-6 term query criteria.

For this load the CPU utilization is going upto 75%. The performance of each query is still good. One query took about 90ms to return the result.

We then reduced the shards to 3 and ran the same tests. 
The CPU usage remained the same but the performance degraded. Now each request took about 180ms to return the result.

We expected the results to improve since we reduced the number of shards. Not the opposite happened. Is this the expected result.
And is the high CPU usage also expected?

Thanks
Rohit 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: High cpu usage on large ec2 nodes

rohit reddy
Attached the hot-thread snapshot using the elasticsearch api. 
I'm using DFS_QUERY_THEN_FETCH for the search. 

https://gist.github.com/rohitreddy/4729660

Seems like most of the threads on waiting on reading from lucene index. Is the normal? or should i tweek some configurations to reduce this. Using all defaults for now.


On Thursday, January 31, 2013 11:37:19 PM UTC+5:30, rohit reddy wrote:
We are using ephemeral disk with s3 backup. Since we expect the performance of ephemeral disk to be better than EBS. And since our index does not get updated too frequently, the overhead of storing backups in S3 is not huge.

I'll see use the API and try to identify which resource is taking up the CPU. 

Thanks
Rohit

On Tuesday, January 29, 2013 1:12:34 PM UTC+5:30, Karel Minařík wrote:
In general, yes, decreasing the number of shards should improve search performance (less Lucene indices to search against), but I suspect in your benchmarking scenario, there are many variables and it's hard to keep them consistent:

* The m1.large instance type is quite small, in a sense it has lot of "neighbours" -- you never know who is doing what in the same rack
* The m2.xlarge is better in this sense, and also allows you to use the high I/O EBS volumes
* A *lot* depends on the disk used for ES -- are you using the EBS-backed instance disk? The "physical" ephemeral disk for the instance? Extra EBS volume, possibly IOPS?
* Regarding the CPU, I'd say it's expected you'll saturate the resources of the machine at one point, and ~100 req/sec sounds kinda OK to me for the type of machine in question. You can use the `hot_threads`  API to check where the time is spent: http://www.elasticsearch.org/guide/reference/api/admin-cluster-nodes-hot-threads.html

Karel

On Monday, January 28, 2013 6:52:19 PM UTC+1, rohit reddy wrote:
Hi,

I'm pretty new to elasticsearch, though i have extensively used lucene.
We are currently migrating from lucene to elasticsearch in our project.

We create a basic elasticsearch setup on AWS cloud and are trying to test the performance of the same.

The configuration:
EC2 Nodes - 2 Large nodes
Shards - 5
Replication - 1
Memory settings - 4GB

We have created a basic index whose size is about 7GB. For the performance tests, we have pretty much maintained a constant index, ie., the index is not getting updated. There are no index events to the elasticsearch server.

Not we are bombarding each elasticsearch node with about 100 search requets per sec (using a single jmeter client for this). Each search query is a boolean query with 5-6 term query criteria.

For this load the CPU utilization is going upto 75%. The performance of each query is still good. One query took about 90ms to return the result.

We then reduced the shards to 3 and ran the same tests. 
The CPU usage remained the same but the performance degraded. Now each request took about 180ms to return the result.

We expected the results to improve since we reduced the number of shards. Not the opposite happened. Is this the expected result.
And is the high CPU usage also expected?

Thanks
Rohit 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: High cpu usage on large ec2 nodes

kimchy
Administrator
It seems like its waiting most of the time on read. Which instance type of AWS are you using? Make sure to have ~50% of the memory allocated to ES (ES_HEAP_SIZE), and the other half to the OS.

Also, which java version are you using? Make sure you are on the latest 1.6 (update 34  and above) or 1.7. This makes a big difference and in older Linux distro, the default java provided is pretty old (4 years old).

I will shy away from the DFS_ type, typically, you don't really need it with big enough data set.

Last, the reason why more shards performed better is because, even on 2 nodes, each search request was being parallelized across more shards (and less data). Note, if you start running concurrent client tests, make sure to configure the search thread pool with a fixed thread size of about 4 times the CPUs you have, so it won't overflow the concurrent execution.

On Feb 7, 2013, at 10:01 AM, rohit reddy <[hidden email]> wrote:

Attached the hot-thread snapshot using the elasticsearch api. 
I'm using DFS_QUERY_THEN_FETCH for the search. 

https://gist.github.com/rohitreddy/4729660

Seems like most of the threads on waiting on reading from lucene index. Is the normal? or should i tweek some configurations to reduce this. Using all defaults for now.


On Thursday, January 31, 2013 11:37:19 PM UTC+5:30, rohit reddy wrote:
We are using ephemeral disk with s3 backup. Since we expect the performance of ephemeral disk to be better than EBS. And since our index does not get updated too frequently, the overhead of storing backups in S3 is not huge.

I'll see use the API and try to identify which resource is taking up the CPU. 

Thanks
Rohit

On Tuesday, January 29, 2013 1:12:34 PM UTC+5:30, Karel Minařík wrote:
In general, yes, decreasing the number of shards should improve search performance (less Lucene indices to search against), but I suspect in your benchmarking scenario, there are many variables and it's hard to keep them consistent:

* The m1.large instance type is quite small, in a sense it has lot of "neighbours" -- you never know who is doing what in the same rack
* The m2.xlarge is better in this sense, and also allows you to use the high I/O EBS volumes
* A *lot* depends on the disk used for ES -- are you using the EBS-backed instance disk? The "physical" ephemeral disk for the instance? Extra EBS volume, possibly IOPS?
* Regarding the CPU, I'd say it's expected you'll saturate the resources of the machine at one point, and ~100 req/sec sounds kinda OK to me for the type of machine in question. You can use the `hot_threads`  API to check where the time is spent: http://www.elasticsearch.org/guide/reference/api/admin-cluster-nodes-hot-threads.html

Karel

On Monday, January 28, 2013 6:52:19 PM UTC+1, rohit reddy wrote:
Hi,

I'm pretty new to elasticsearch, though i have extensively used lucene.
We are currently migrating from lucene to elasticsearch in our project.

We create a basic elasticsearch setup on AWS cloud and are trying to test the performance of the same.

The configuration:
EC2 Nodes - 2 Large nodes
Shards - 5
Replication - 1
Memory settings - 4GB

We have created a basic index whose size is about 7GB. For the performance tests, we have pretty much maintained a constant index, ie., the index is not getting updated. There are no index events to the elasticsearch server.

Not we are bombarding each elasticsearch node with about 100 search requets per sec (using a single jmeter client for this). Each search query is a boolean query with 5-6 term query criteria.

For this load the CPU utilization is going upto 75%. The performance of each query is still good. One query took about 90ms to return the result.

We then reduced the shards to 3 and ran the same tests. 
The CPU usage remained the same but the performance degraded. Now each request took about 180ms to return the result.

We expected the results to improve since we reduced the number of shards. Not the opposite happened. Is this the expected result.
And is the high CPU usage also expected?

Thanks
Rohit 


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.