# of shards vs. open files

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

# of shards vs. open files

Javier Muniz
Does the # of shards a node has impact the # of open files required?  I am running a node that has more than 3000 shards because I have broken the data into a single-index-per-customer layout and I am finding myself running into the "too many open files" problem more and more.  I just recently had to bump open files past 32000 (verified via -Des.max-open-files=true) in order to continue to add more customers to the node.

I guess my question is should I put these customers all into a single index to reduce the total # of shards, reduce the # of shards per index, or is my problem completely unrelated to sharding?

-javier
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: # of shards vs. open files

kimchy
Administrator
Yes, each shard is a Lucene index, which requires its share of open files handles (and memory requirements and so on). You can go with a single index, and route based on user (its simpler to do that with 0.17, since you can associate an alias with the username, and an alias can have a filter (to filter results only for the relevant user), and a routing value (probably the username).

On Mon, Aug 8, 2011 at 9:52 PM, Javier Muniz <[hidden email]> wrote:
Does the # of shards a node has impact the # of open files required?  I am running a node that has more than 3000 shards because I have broken the data into a single-index-per-customer layout and I am finding myself running into the "too many open files" problem more and more.  I just recently had to bump open files past 32000 (verified via -Des.max-open-files=true) in order to continue to add more customers to the node.

I guess my question is should I put these customers all into a single index to reduce the total # of shards, reduce the # of shards per index, or is my problem completely unrelated to sharding?

-javier

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: # of shards vs. open files

Javier Muniz
Hmm. I think the problem I'll have then is that each customer has their own set of ids so the document's id won't necessarily be unique. Placing each customer in their own index was a simple way of dealing with this. Is there any out of the box way to address this as well, or should I just start using a compound key?

-javier


From: [hidden email] [[hidden email]] on behalf of Shay Banon [[hidden email]]
Sent: Monday, August 08, 2011 12:09 PM
To: [hidden email]
Subject: Re: # of shards vs. open files

Yes, each shard is a Lucene index, which requires its share of open files handles (and memory requirements and so on). You can go with a single index, and route based on user (its simpler to do that with 0.17, since you can associate an alias with the username, and an alias can have a filter (to filter results only for the relevant user), and a routing value (probably the username).

On Mon, Aug 8, 2011 at 9:52 PM, Javier Muniz <[hidden email]> wrote:
Does the # of shards a node has impact the # of open files required?  I am running a node that has more than 3000 shards because I have broken the data into a single-index-per-customer layout and I am finding myself running into the "too many open files" problem more and more.  I just recently had to bump open files past 32000 (verified via -Des.max-open-files=true) in order to continue to add more customers to the node.

I guess my question is should I put these customers all into a single index to reduce the total # of shards, reduce the # of shards per index, or is my problem completely unrelated to sharding?

-javier

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: # of shards vs. open files

kimchy
Administrator
There are ways to reduce the number of open files for each Lucene index. One option is to use the compound file format. What it does is basically takes most of the files of a segment in an index, and compound them into a single file. This is an expensive IO wise operation, but it really depends on the use case if you can take it or not. Another option is to reduce the number of segments by tweaking the merge policy: http://www.elasticsearch.org/guide/reference/index-modules/merge.html (you can tell what segments form an index using the segments API: http://www.elasticsearch.org/guide/reference/api/admin-indices-segments.html).

Of course, another option that you have is to start more nodes, and have the shards allocated on them to reduce the number of shards allocated per node.

On Tue, Aug 9, 2011 at 2:27 AM, Javier Muniz <[hidden email]> wrote:
Hmm. I think the problem I'll have then is that each customer has their own set of ids so the document's id won't necessarily be unique. Placing each customer in their own index was a simple way of dealing with this. Is there any out of the box way to address this as well, or should I just start using a compound key?

-javier


From: [hidden email] [[hidden email]] on behalf of Shay Banon [[hidden email]]
Sent: Monday, August 08, 2011 12:09 PM
To: [hidden email]
Subject: Re: # of shards vs. open files

Yes, each shard is a Lucene index, which requires its share of open files handles (and memory requirements and so on). You can go with a single index, and route based on user (its simpler to do that with 0.17, since you can associate an alias with the username, and an alias can have a filter (to filter results only for the relevant user), and a routing value (probably the username).

On Mon, Aug 8, 2011 at 9:52 PM, Javier Muniz <[hidden email]> wrote:
Does the # of shards a node has impact the # of open files required?  I am running a node that has more than 3000 shards because I have broken the data into a single-index-per-customer layout and I am finding myself running into the "too many open files" problem more and more.  I just recently had to bump open files past 32000 (verified via -Des.max-open-files=true) in order to continue to add more customers to the node.

I guess my question is should I put these customers all into a single index to reduce the total # of shards, reduce the # of shards per index, or is my problem completely unrelated to sharding?

-javier


Loading...