Too many open files exception even after raising the open file limit

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Too many open files exception even after raising the open file limit

Lucas Ward
All,

I'm aware of the known issue with the limit of file descriptors, so
when I first got this issue I upped the limit.  I kept getting the
exception, so I kept upping it.  As an example, here is what ulimit -a
returns:

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 20
file size               (blocks, -f) unlimited
pending signals                 (-i) 16382
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 100000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

I've even tried cranking it up to 300K, and I still get the following
error:

19) Error injecting constructor, java.io.IOException: directory '/opt/
elasticsearch-0.15.2/data/elasticsearch/nodes/0/indices/account_26/0/
index' exists and is a directory, but cannot be listed: list()
returned null
  at
org.elasticsearch.index.store.fs.NioFsStore.<init>(NioFsStore.java:50)
  while locating org.elasticsearch.index.store.fs.NioFsStore
  at
org.elasticsearch.index.store.StoreModule.configure(StoreModule.java:
40)
  while locating org.elasticsearch.index.store.Store
    for parameter 3 at
org.elasticsearch.index.shard.service.InternalIndexShard.<init>(InternalIndexShard.java:
108)
  while locating
org.elasticsearch.index.shard.service.InternalIndexShard
  at
org.elasticsearch.index.shard.IndexShardModule.configure(IndexShardModule.java:
39)
  while locating org.elasticsearch.index.shard.service.IndexShard
    for parameter 3 at
org.elasticsearch.index.gateway.IndexShardGatewayService.<init>(IndexShardGatewayService.java:
74)
  at
org.elasticsearch.index.gateway.IndexShardGatewayModule.configure(IndexShardGatewayModule.java:
40)
  while locating
org.elasticsearch.index.gateway.IndexShardGatewayService

Or sometimes the 'too many open files exception'.  Once this happens,
the cluster is dead.  I have to stop the process, delete the data
directory, and restart it.  Once I try indexing again, I get the same
error, at the same record count.  This is only about 80K records, with
a small fraction of the number of fields I will likely eventually
need, so it seems like it should be fine.  Also, lsof | wc - l is
showing a reasonable number (less than 10k) so I'm at a loss.

What's even more weird is that when I run elasticsearch as a local
node 'inter process' (In the same jvm rather than starting it in a
separate jvm) I am able to index the same number of records without
any issues.  I'm using Ubuntu, is there some kind of limit somewhere
else I'm missing?  I'm at a bit of a loss.

Thanks in advance,

Lucas
Reply | Threaded
Open this post in threaded view
|

Re: Too many open files exception even after raising the open file limit

kimchy
Administrator
Make sure that the increased open file limit actually applies to the elasticsearch process you start. You can start the script with a flag that logs the number of files ES can open on startup:

bin/elasticsearch -Des.max-open-files=true -f

On Saturday, May 28, 2011 at 3:42 AM, Lucas wrote:

All,

I'm aware of the known issue with the limit of file descriptors, so
when I first got this issue I upped the limit. I kept getting the
exception, so I kept upping it. As an example, here is what ulimit -a
returns:

core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 20
file size (blocks, -f) unlimited
pending signals (-i) 16382
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 100000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
mmax user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited<

I've even tried cranking it up to 300K, and I still get the following
error:

19) Error injecting constructor, java.io.IOException: directory '/opt/
elasticsearch-0.15.2/data/elasticsearch/nodes/0/indices/account_26/0/
index' exists and is a directory, but cannot be listed: list()
returned null
at
org.elasticsearch.index.store.fs.NioFsStore.<init>(NioFsStore.java:50)
while locating org.elasticsearch.index.store.fs.NioFsStore
at
org.elasticsearch.index.store.StoreModule.configure(StoreModule.java:
40)
while locating org.elasticsearch.index.store.Store
for parameter 3 at
org.elasticsearch.index.shard.service.InternalIndexShard.<init>(InternalIndexShard.java:
108)
while locating
org.elasticsearch.index.shard.service.InternalIndexShard
at
org.elasticsearch.index.shard.IndexShardModule.configure(IndexShardModule.java:
39)
while locating org.elasticsearch.index.shard.service.IndexShard
for parameter 3 at
org.elasticsearch.index.gateway.IndexShardGatewayService.<init>(IndexShardGatewayService.java:
74)
at
org.elasticsearch.index.gateway.IndexShardGatewayModule.configure(IndexShardGatewayModule.java:
40)
while locating
org.elasticsearch.index.gateway.IndexShardGatewayService

Or sometimes the 'too many open files exception'. Once this happens,
the cluster is dead. I have to stop the process, delete the data
directory, and restart it. Once I try indexing again, I get the same
error, at the same record count. This is only about 80K records, with
a small fraction of the number of fields I will likely eventually
need, so it seems like it should be fine. Also, lsof | wc - l is
showing a reasonable number (less than 10k) so I'm at a loss.

What's even more weird is that when I run elasticsearch as a local
node 'inter process' (In the same jvm rather than starting it in a
separate jvm) I am able to index the same number of records without
any issues. I'm using Ubuntu, is there some kind of limit somewhere
else I'm missing? I'm at a bit of a loss.

Thanks in advance,

Lucas

Reply | Threaded
Open this post in threaded view
|

Re: Too many open files exception even after raising the open file limit

Lucas Ward
Thanks for the tip.  The output I'm getting is:

[2011-05-28 20:45:10,721][INFO ][bootstrap                ]
max_open_files [998]

Which explains the issue.  And since the code in FileSystemUtils looks
like it's just opening files in the tmp dir until it gets an
IOException, its certainly more reliable than what ulimit is telling
me.

So, my first thought was that my more general approach in /etc/
security/limits.conf was a problem.  I basically had this:

* - nofile 100000

* is supposed to be everything '-' should mean both hard and soft.
Obviously it isn't working though.  I was running elasticsearch as
root to try and see if that would help.  Obviously not something I
would want to do permanently, but I'm grasping at straws a bit.  I
still had the same issue.  So I created a group and user both named
'elasticsearch'.  I then added them explicitly to limits.conf:

@elasticsearch hard nofile 32000
@elasticsearch soft nofile 32000
elasticsearch hard nofile 32000
elasticsearch soft nofile 32000

I switched to the elasticsearch user after doing a 'chown
elasticsearch:elasticsearch' and ran the command you mentioned.
However, there was no effect, I'm still getting 998 as the limit when
starting elasticsearch.  This is running on the standard Ubuntu Server
10.10 32-bit on Amazon ec2, which shouldn't be a problem as that seems
like a very common usecase from what I can tell on this mailing list.
Has anyone else using a similar setup had this same problem?  Did I
just do something wrong in limits.conf?

On May 28, 3:34 am, Shay Banon <[hidden email]> wrote:

> Make sure that the increased open file limit actually applies to the elasticsearch process you start. You can start the script with a flag that logs the number of files ES can open on startup:
>
> bin/elasticsearch -Des.max-open-files=true -f
>
>
>
>
>
>
>
> On Saturday, May 28, 2011 at 3:42 AM, Lucas wrote:
> > All,
>
> > I'm aware of the known issue with the limit of file descriptors, so
> > when I first got this issue I upped the limit. I kept getting the
> > exception, so I kept upping it. As an example, here is what ulimit -a
> > returns:
>
> > core file size (blocks, -c) 0
> > data seg size (kbytes, -d) unlimited
> > scheduling priority (-e) 20
> > file size (blocks, -f) unlimited
> > pending signals (-i) 16382
> > max locked memory (kbytes, -l) 64
> > max memory size (kbytes, -m) unlimited
> > open files (-n) 100000
> > pipe size (512 bytes, -p) 8
> > POSIX message queues (bytes, -q) 819200
> > real-time priority (-r) 0
> > stack size (kbytes, -s) 8192
> > cpu time (seconds, -t) unlimited
> > max user processes (-u) unlimited
> > virtual memory (kbytes, -v) unlimited
> > file locks (-x) unlimited
>
> > I've even tried cranking it up to 300K, and I still get the following
> > error:
>
> > 19) Error injecting constructor, java.io.IOException: directory '/opt/
> > elasticsearch-0.15.2/data/elasticsearch/nodes/0/indices/account_26/0/
> > index' exists and is a directory, but cannot be listed: list()
> > returned null
> >  at
> > org.elasticsearch.index.store.fs.NioFsStore.<init>(NioFsStore.java:50)
> >  while locating org.elasticsearch.index.store.fs.NioFsStore
> >  at
> > org.elasticsearch.index.store.StoreModule.configure(StoreModule.java:
> > 40)
> >  while locating org.elasticsearch.index.store.Store
> >  for parameter 3 at
> > org.elasticsearch.index.shard.service.InternalIndexShard.<init>(InternalIndexShard.java:
> > 108)
> >  while locating
> > org.elasticsearch.index.shard.service.InternalIndexShard
> >  at
> > org.elasticsearch.index.shard.IndexShardModule.configure(IndexShardModule.java:
> > 39)
> >  while locating org.elasticsearch.index.shard.service.IndexShard
> >  for parameter 3 at
> > org.elasticsearch.index.gateway.IndexShardGatewayService.<init>(IndexShardGatewayService.java:
> > 74)
> >  at
> > org.elasticsearch.index.gateway.IndexShardGatewayModule.configure(IndexShardGatewayModule.java:
> > 40)
> >  while locating
> > org.elasticsearch.index.gateway.IndexShardGatewayService
>
> > Or sometimes the 'too many open files exception'. Once this happens,
> > the cluster is dead. I have to stop the process, delete the data
> > directory, and restart it. Once I try indexing again, I get the same
> > error, at the same record count. This is only about 80K records, with
> > a small fraction of the number of fields I will likely eventually
> > need, so it seems like it should be fine. Also, lsof | wc - l is
> > showing a reasonable number (less than 10k) so I'm at a loss.
>
> > What's even more weird is that when I run elasticsearch as a local
> > node 'inter process' (In the same jvm rather than starting it in a
> > separate jvm) I am able to index the same number of records without
> > any issues. I'm using Ubuntu, is there some kind of limit somewhere
> > else I'm missing? I'm at a bit of a loss.
>
> > Thanks in advance,
>
> > Lucas
Reply | Threaded
Open this post in threaded view
|

Re: Too many open files exception even after raising the open file limit

James Cook
We had the very same problem using the wildcard syntax in limits.conf using the default AMI for Elastic Beanstalk's flavor of "Amazon Linux".

For us, this syntax in limits.conf did the trick:

tomcat        - nofile        32000

Our Elastic Search node is embedded in a web application, hence the tomcat username.
 
-- jim


On Sat, May 28, 2011 at 4:57 PM, Lucas <[hidden email]> wrote:
Thanks for the tip.  The output I'm getting is:

[2011-05-28 20:45:10,721][INFO ][bootstrap                ]
max_open_files [998]

Which explains the issue.  And since the code in FileSystemUtils looks
like it's just opening files in the tmp dir until it gets an
IOException, its certainly more reliable than what ulimit is telling
me.

So, my first thought was that my more general approach in /etc/
security/limits.conf was a problem.  I basically had this:

* - nofile 100000

* is supposed to be everything '-' should mean both hard and soft.
Obviously it isn't working though.  I was running elasticsearch as
root to try and see if that would help.  Obviously not something I
would want to do permanently, but I'm grasping at straws a bit.  I
still had the same issue.  So I created a group and user both named
'elasticsearch'.  I then added them explicitly to limits.conf:

@elasticsearch hard nofile 32000
@elasticsearch soft nofile 32000
elasticsearch hard nofile 32000
elasticsearch soft nofile 32000

I switched to the elasticsearch user after doing a 'chown
elasticsearch:elasticsearch' and ran the command you mentioned.
However, there was no effect, I'm still getting 998 as the limit when
starting elasticsearch.  This is running on the standard Ubuntu Server
10.10 32-bit on Amazon ec2, which shouldn't be a problem as that seems
like a very common usecase from what I can tell on this mailing list.
Has anyone else using a similar setup had this same problem?  Did I
just do something wrong in limits.conf?

On May 28, 3:34 am, Shay Banon <[hidden email]> wrote:
> Make sure that the increased open file limit actually applies to the elasticsearch process you start. You can start the script with a flag that logs the number of files ES can open on startup:
>
> bin/elasticsearch -Des.max-open-files=true -f
>
>
>
>
>
>
>
> On Saturday, May 28, 2011 at 3:42 AM, Lucas wrote:
> > All,
>
> > I'm aware of the known issue with the limit of file descriptors, so
> > when I first got this issue I upped the limit. I kept getting the
> > exception, so I kept upping it. As an example, here is what ulimit -a
> > returns:
>
> > core file size (blocks, -c) 0
> > data seg size (kbytes, -d) unlimited
> > scheduling priority (-e) 20
> > file size (blocks, -f) unlimited
> > pending signals (-i) 16382
> > max locked memory (kbytes, -l) 64
> > max memory size (kbytes, -m) unlimited
> > open files (-n) 100000
> > pipe size (512 bytes, -p) 8
> > POSIX message queues (bytes, -q) 819200
> > real-time priority (-r) 0
> > stack size (kbytes, -s) 8192
> > cpu time (seconds, -t) unlimited
> > max user processes (-u) unlimited
> > virtual memory (kbytes, -v) unlimited
> > file locks (-x) unlimited
>
> > I've even tried cranking it up to 300K, and I still get the following
> > error:
>
> > 19) Error injecting constructor, java.io.IOException: directory '/opt/
> > elasticsearch-0.15.2/data/elasticsearch/nodes/0/indices/account_26/0/
> > index' exists and is a directory, but cannot be listed: list()
> > returned null
> >  at
> > org.elasticsearch.index.store.fs.NioFsStore.<init>(NioFsStore.java:50)
> >  while locating org.elasticsearch.index.store.fs.NioFsStore
> >  at
> > org.elasticsearch.index.store.StoreModule.configure(StoreModule.java:
> > 40)
> >  while locating org.elasticsearch.index.store.Store
> >  for parameter 3 at
> > org.elasticsearch.index.shard.service.InternalIndexShard.<init>(InternalIndexShard.java:
> > 108)
> >  while locating
> > org.elasticsearch.index.shard.service.InternalIndexShard
> >  at
> > org.elasticsearch.index.shard.IndexShardModule.configure(IndexShardModule.java:
> > 39)
> >  while locating org.elasticsearch.index.shard.service.IndexShard
> >  for parameter 3 at
> > org.elasticsearch.index.gateway.IndexShardGatewayService.<init>(IndexShardGatewayService.java:
> > 74)
> >  at
> > org.elasticsearch.index.gateway.IndexShardGatewayModule.configure(IndexShardGatewayModule.java:
> > 40)
> >  while locating
> > org.elasticsearch.index.gateway.IndexShardGatewayService
>
> > Or sometimes the 'too many open files exception'. Once this happens,
> > the cluster is dead. I have to stop the process, delete the data
> > directory, and restart it. Once I try indexing again, I get the same
> > error, at the same record count. This is only about 80K records, with
> > a small fraction of the number of fields I will likely eventually
> > need, so it seems like it should be fine. Also, lsof | wc - l is
> > showing a reasonable number (less than 10k) so I'm at a loss.
>
> > What's even more weird is that when I run elasticsearch as a local
> > node 'inter process' (In the same jvm rather than starting it in a
> > separate jvm) I am able to index the same number of records without
> > any issues. I'm using Ubuntu, is there some kind of limit somewhere
> > else I'm missing? I'm at a bit of a loss.
>
> > Thanks in advance,
>
> > Lucas

Reply | Threaded
Open this post in threaded view
|

Re: Too many open files exception even after raising the open file limit

Lucas Ward
Interesting.  I added the following to my file:

root hard nofile 32000
root soft nofile 32000

And ran it as root, which worked.  I still have no idea why running it
as the elasticsearch user doesn't work though.

On May 28, 5:02 pm, James Cook <[hidden email]> wrote:

> We had the very same problem using the wildcard syntax in limits.conf using
> the default AMI for Elastic Beanstalk's flavor of "Amazon Linux".
>
> For us, this syntax in limits.conf did the trick:
>
> tomcat        - nofile        32000
>
> Our Elastic Search node is embedded in a web application, hence the tomcat
> username.
>
> *-- jim
> *
>
>
>
>
>
>
>
> On Sat, May 28, 2011 at 4:57 PM, Lucas <[hidden email]> wrote:
> > Thanks for the tip.  The output I'm getting is:
>
> > [2011-05-28 20:45:10,721][INFO ][bootstrap                ]
> > max_open_files [998]
>
> > Which explains the issue.  And since the code in FileSystemUtils looks
> > like it's just opening files in the tmp dir until it gets an
> > IOException, its certainly more reliable than what ulimit is telling
> > me.
>
> > So, my first thought was that my more general approach in /etc/
> > security/limits.conf was a problem.  I basically had this:
>
> > * - nofile 100000
>
> > * is supposed to be everything '-' should mean both hard and soft.
> > Obviously it isn't working though.  I was running elasticsearch as
> > root to try and see if that would help.  Obviously not something I
> > would want to do permanently, but I'm grasping at straws a bit.  I
> > still had the same issue.  So I created a group and user both named
> > 'elasticsearch'.  I then added them explicitly to limits.conf:
>
> > @elasticsearch hard nofile 32000
> > @elasticsearch soft nofile 32000
> > elasticsearch hard nofile 32000
> > elasticsearch soft nofile 32000
>
> > I switched to the elasticsearch user after doing a 'chown
> > elasticsearch:elasticsearch' and ran the command you mentioned.
> > However, there was no effect, I'm still getting 998 as the limit when
> > starting elasticsearch.  This is running on the standard Ubuntu Server
> > 10.10 32-bit on Amazon ec2, which shouldn't be a problem as that seems
> > like a very common usecase from what I can tell on this mailing list.
> > Has anyone else using a similar setup had this same problem?  Did I
> > just do something wrong in limits.conf?
>
> > On May 28, 3:34 am, Shay Banon <[hidden email]> wrote:
> > > Make sure that the increased open file limit actually applies to the
> > elasticsearch process you start. You can start the script with a flag that
> > logs the number of files ES can open on startup:
>
> > > bin/elasticsearch -Des.max-open-files=true -f
>
> > > On Saturday, May 28, 2011 at 3:42 AM, Lucas wrote:
> > > > All,
>
> > > > I'm aware of the known issue with the limit of file descriptors, so
> > > > when I first got this issue I upped the limit. I kept getting the
> > > > exception, so I kept upping it. As an example, here is what ulimit -a
> > > > returns:
>
> > > > core file size (blocks, -c) 0
> > > > data seg size (kbytes, -d) unlimited
> > > > scheduling priority (-e) 20
> > > > file size (blocks, -f) unlimited
> > > > pending signals (-i) 16382
> > > > max locked memory (kbytes, -l) 64
> > > > max memory size (kbytes, -m) unlimited
> > > > open files (-n) 100000
> > > > pipe size (512 bytes, -p) 8
> > > > POSIX message queues (bytes, -q) 819200
> > > > real-time priority (-r) 0
> > > > stack size (kbytes, -s) 8192
> > > > cpu time (seconds, -t) unlimited
> > > > max user processes (-u) unlimited
> > > > virtual memory (kbytes, -v) unlimited
> > > > file locks (-x) unlimited
>
> > > > I've even tried cranking it up to 300K, and I still get the following
> > > > error:
>
> > > > 19) Error injecting constructor, java.io.IOException: directory '/opt/
> > > > elasticsearch-0.15.2/data/elasticsearch/nodes/0/indices/account_26/0/
> > > > index' exists and is a directory, but cannot be listed: list()
> > > > returned null
> > > >  at
> > > > org.elasticsearch.index.store.fs.NioFsStore.<init>(NioFsStore.java:50)
> > > >  while locating org.elasticsearch.index.store.fs.NioFsStore
> > > >  at
> > > > org.elasticsearch.index.store.StoreModule.configure(StoreModule.java:
> > > > 40)
> > > >  while locating org.elasticsearch.index.store.Store
> > > >  for parameter 3 at
>
> > org.elasticsearch.index.shard.service.InternalIndexShard.<init>(InternalIndexShard.java:
> > > > 108)
> > > >  while locating
> > > > org.elasticsearch.index.shard.service.InternalIndexShard
> > > >  at
>
> > org.elasticsearch.index.shard.IndexShardModule.configure(IndexShardModule.java:
> > > > 39)
> > > >  while locating org.elasticsearch.index.shard.service.IndexShard
> > > >  for parameter 3 at
>
> > org.elasticsearch.index.gateway.IndexShardGatewayService.<init>(IndexShardGatewayService.java:
> > > > 74)
> > > >  at
>
> > org.elasticsearch.index.gateway.IndexShardGatewayModule.configure(IndexShardGatewayModule.java:
> > > > 40)
> > > >  while locating
> > > > org.elasticsearch.index.gateway.IndexShardGatewayService
>
> > > > Or sometimes the 'too many open files exception'. Once this happens,
> > > > the cluster is dead. I have to stop the process, delete the data
> > > > directory, and restart it. Once I try indexing again, I get the same
> > > > error, at the same record count. This is only about 80K records, with
> > > > a small fraction of the number of fields I will likely eventually
> > > > need, so it seems like it should be fine. Also, lsof | wc - l is
> > > > showing a reasonable number (less than 10k) so I'm at a loss.
>
> > > > What's even more weird is that when I run elasticsearch as a local
> > > > node 'inter process' (In the same jvm rather than starting it in a
> > > > separate jvm) I am able to index the same number of records without
> > > > any issues. I'm using Ubuntu, is there some kind of limit somewhere
> > > > else I'm missing? I'm at a bit of a loss.
>
> > > > Thanks in advance,
>
> > > > Lucas
Reply | Threaded
Open this post in threaded view
|

Monster Beats Headphone

fashionalwallet
Banned User
This post was updated on .
In reply to this post by Lucas Ward
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Too many open files exception even after raising the open file limit

Peter Gillard-Moss
This post has NOT been accepted by the mailing list yet.
In reply to this post by Lucas Ward
I encountered this same problem on precise and I discovered the solution.

I would run ulimits as so:
  sudo -u elasticsearch sh -c 'ulimit -a'
That returned the number correctly that I'd set in limits.conf (64000).

I tried the max_limits debug trick but found it entirely unhelpful.  It returned 'max_files [0]'.  Not good.  Instead I ran the following:
   curl -o - http://localhost:9200/_nodes?process

And I could see that returned 'max_file_descriptors':4096

So that confirmed that elasticsearch wasn't getting all the file handles as configured.

I discovered the problem is to do with the way in which elasticsearch is invoked.  In the /etc/init/elasticsearch.conf you can see that it is run inside dash:

  su -s /bin/dash -c "/usr/bin/elasticsearch -f" elasticsearch

Running the same but with ulimit and I was returned 4096:
   sudo su -s /bin/sh -c 'ulimit -a' elasticsearch

So the problem must be to so with dash.

I discovered that I needed to add the following to /etc/pam.d/common-session for dash:
   session required        pam_limits.so

I had only added it to /etc/pam.d/common-session-noninteractive (which worked for sh).  Probably safest to add to both although only common-session should be needed.

This fixed my problem.