Too many files open

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Too many files open

thinusp
Could someone perhaps help me troubleshoot this problem... Over the weekend I ran a test on my 5-node cluster, which started failing after a while.  After looking at the logs (which came up to be around 9 GB of data...) I found this error being thrown repeatedly:

[2012-03-26 14:00:15,270][WARN ][netty.channel.socket.nio.NioServerSocketPipelineSink] Failed to accept a connection.
java.io.IOException: Too many open files
        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163)
        at org.elasticsearch.common.netty.channel.socket.nio.NioServerSocketPipelineSink$Boss.run(NioServerSocketPipelineSink.java:236)
        at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:102)
        at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:679)

It seems as if there's some problem with "Too many open files", but this cannot be the case, or at least not the way I understand it.  I set the user running the ElasticSearch instance's file limit to 32000 as per the suggestion on the website, but even if I check with "lsof | wc -l" there is in total only about 6000 descriptors open, so I doubt it's really the problem.  What bugs me, but I really do not know much about these things, is that it seems to be a "NioServerSocketPipelineSink" connection, for which I'm not entirely sure how it relates to disc-IO.

Another seemingly related issue is the following error which is also thrown into the mix:

org.elasticsearch.index.engine.IndexFailedEngineException: [dev-index][1] Index failed for [info#< omitted >]
        at org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.java:482)
        at org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:323)
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:158)
        at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:529)
        at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:427)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:679)
Caused by: java.io.FileNotFoundException: /data2/esc/cluster0/nodes/0/indices/dev-index/1/index/_j3.fdx (Too many open files)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)
        at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:441)
        at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:306)
        at org.elasticsearch.index.store.Store$StoreDirectory.createOutput(Store.java:418)
        at org.elasticsearch.index.store.Store$StoreDirectory.createOutput(Store.java:390)
        at org.apache.lucene.index.FieldsWriter.<init>(FieldsWriter.java:84)
        at org.apache.lucene.index.StoredFieldsWriter.initFieldsWriter(StoredFieldsWriter.java:65)
        at org.apache.lucene.index.StoredFieldsWriter.finishDocument(StoredFieldsWriter.java:108)
        at org.apache.lucene.index.StoredFieldsWriter$PerDoc.finish(StoredFieldsWriter.java:152)
        at org.apache.lucene.index.DocumentsWriter$WaitQueue.writeDocument(DocumentsWriter.java:1404)
        at org.apache.lucene.index.DocumentsWriter$WaitQueue.add(DocumentsWriter.java:1424)
        at org.apache.lucene.index.DocumentsWriter.finishDocument(DocumentsWriter.java:1043)
        at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772)
        at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2066)
        at org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.java:565)
        at org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.java:477)
        ... 7 more

Again there's that "Too many open files" message.  Any idea as to what might be causing this problem?  Am I using the Java API wrong?  Thanks for the help - I appreciate it.

- Thinus
Reply | Threaded
Open this post in threaded view
|

Re: Too many files open

Thomas Peuss
Hi Thinus!

Am Montag, 26. März 2012 14:11:08 UTC+2 schrieb Thinus Prinsloo:
java.io.IOException: Too many open files
        at sun.nio.ch.​ServerSocketChannelImpl.​accept0(Native Method)
        at sun.nio.ch.​ServerSocketChannelImpl.​accept(​ServerSocketChannelImpl.java:​163)
        at org.elasticsearch.common.​netty.channel.socket.nio.​NioServerSocketPipelineSink$​Boss.run(​NioServerSocketPipelineSink.​java:236)
        at org.elasticsearch.common.​netty.util.​ThreadRenamingRunnable.run(​ThreadRenamingRunnable.java:​102)
        at org.elasticsearch.common.​netty.util.internal.​DeadLockProofWorker$1.run(​DeadLockProofWorker.java:42)
        at java.util.concurrent.​ThreadPoolExecutor.runWorker(​ThreadPoolExecutor.java:1110)
        at java.util.concurrent.​ThreadPoolExecutor$Worker.run(​ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.​java:679)

You need to raise the "ulimit" for the user that is running ES. Depends on your Linux distribution where you have to do it. On RedHat you can add a file to  /etc/security/limits.d.

Ours looks like this:
elastic   -   memlock     unlimited
elastic soft  nofile          80000
elastic hard  nofile      100000

Our user is called "elastic". The first line allows the ES JVM to lock as much memory as it wants to (memory mapped files count as memory as well!). When the user reaches the soft limit a warning is written to the log.

CU
Thomas
Reply | Threaded
Open this post in threaded view
|

Re: Too many files open

kimchy
Administrator
Also, check nodes info and nodes stats API, they provide information on the max open files desc limit, and hte current open files. Verify through the nodes info API that your setting has actually taken affect.

On Mon, Mar 26, 2012 at 4:10 PM, Thomas Peuss <[hidden email]> wrote:
Hi Thinus!

Am Montag, 26. März 2012 14:11:08 UTC+2 schrieb Thinus Prinsloo:
java.io.IOException: Too many open files
        at sun.nio.ch.​ServerSocketChannelImpl.​accept0(Native Method)
        at sun.nio.ch.​ServerSocketChannelImpl.​accept(​ServerSocketChannelImpl.java:​163)
        at org.elasticsearch.common.​netty.channel.socket.nio.​NioServerSocketPipelineSink$​Boss.run(​NioServerSocketPipelineSink.​java:236)
        at org.elasticsearch.common.​netty.util.​ThreadRenamingRunnable.run(​ThreadRenamingRunnable.java:​102)
        at org.elasticsearch.common.​netty.util.internal.​DeadLockProofWorker$1.run(​DeadLockProofWorker.java:42)
        at java.util.concurrent.​ThreadPoolExecutor.runWorker(​ThreadPoolExecutor.java:1110)
        at java.util.concurrent.​ThreadPoolExecutor$Worker.run(​ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.​java:679)

You need to raise the "ulimit" for the user that is running ES. Depends on your Linux distribution where you have to do it. On RedHat you can add a file to  /etc/security/limits.d.

Ours looks like this:
elastic   -   memlock     unlimited
elastic soft  nofile          80000
elastic hard  nofile      100000

Our user is called "elastic". The first line allows the ES JVM to lock as much memory as it wants to (memory mapped files count as memory as well!). When the user reaches the soft limit a warning is written to the log.

CU
Thomas

Reply | Threaded
Open this post in threaded view
|

Re: Too many files open

thinusp
Thanks - that's basically what I did.  I realised that the configuration I used to set the limits did not work, so I added it to the ElasticSearch launch script.  I realised that when I was trying to verify through the API as you suggested.  What made it very difficult then was that even though it reported the right limit, I still got the error and when viewing the associated file descriptors for that thread the number was not near the limit.  All I could gather here was that because of the previous failure of running out of file-id's, the index was now severely corrupted (not sure if it can happen like that).  So this time, though it showed the same error, it could actually not find the file.  So I started deleting some shards and had ES recover until it sorted my problem.  It was a bit messy though...

Bottom-line, make sure that setting is working properly by using the API, and don't ever allow it to try and index millions of files when that error occurs.  Rather stop...  :)

On Tue, Mar 27, 2012 at 2:56 PM, Shay Banon <[hidden email]> wrote:
Also, check nodes info and nodes stats API, they provide information on the max open files desc limit, and hte current open files. Verify through the nodes info API that your setting has actually taken affect.

On Mon, Mar 26, 2012 at 4:10 PM, Thomas Peuss <[hidden email]> wrote:
Hi Thinus!

Am Montag, 26. März 2012 14:11:08 UTC+2 schrieb Thinus Prinsloo:
java.io.IOException: Too many open files
        at sun.nio.ch.​ServerSocketChannelImpl.​accept0(Native Method)
        at sun.nio.ch.​ServerSocketChannelImpl.​accept(​ServerSocketChannelImpl.java:​163)
        at org.elasticsearch.common.​netty.channel.socket.nio.​NioServerSocketPipelineSink$​Boss.run(​NioServerSocketPipelineSink.​java:236)
        at org.elasticsearch.common.​netty.util.​ThreadRenamingRunnable.run(​ThreadRenamingRunnable.java:​102)
        at org.elasticsearch.common.​netty.util.internal.​DeadLockProofWorker$1.run(​DeadLockProofWorker.java:42)
        at java.util.concurrent.​ThreadPoolExecutor.runWorker(​ThreadPoolExecutor.java:1110)
        at java.util.concurrent.​ThreadPoolExecutor$Worker.run(​ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.​java:679)

You need to raise the "ulimit" for the user that is running ES. Depends on your Linux distribution where you have to do it. On RedHat you can add a file to  /etc/security/limits.d.

Ours looks like this:
elastic   -   memlock     unlimited
elastic soft  nofile          80000
elastic hard  nofile      100000

Our user is called "elastic". The first line allows the ES JVM to lock as much memory as it wants to (memory mapped files count as memory as well!). When the user reaches the soft limit a warning is written to the log.

CU
Thomas




--
Thinus Prinsloo
E-mail: [hidden email]
Cell: +27 82 339 2226



Reply | Threaded
Open this post in threaded view
|

Re: Too many files open

q42jaap
In reply to this post by kimchy
Hi Shay,

in 0.19.9, the node stats don't seem to mention max open files desc limit.
Do you have documentation somewhere which request I have to make (with curl for example)?

Thanks,

Jaap

On Tuesday, March 27, 2012 2:56:59 PM UTC+2, kimchy wrote:
Also, check nodes info and nodes stats API, they provide information on the max open files desc limit, and hte current open files. Verify through the nodes info API that your setting has actually taken affect.

On Mon, Mar 26, 2012 at 4:10 PM, Thomas Peuss <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="cQNB8aymLnUJ">thomas...@...> wrote:
Hi Thinus!

Am Montag, 26. März 2012 14:11:08 UTC+2 schrieb Thinus Prinsloo:
java.io.IOException: Too many open files
        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163)
        at org.elasticsearch.common.netty.channel.socket.nio.NioServerSocketPipelineSink$Boss.run(NioServerSocketPipelineSink.java:236)
        at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:102)
        at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:679)

You need to raise the "ulimit" for the user that is running ES. Depends on your Linux distribution where you have to do it. On RedHat you can add a file to  /etc/security/limits.d.

Ours looks like this:
elastic   -   memlock     unlimited
elastic soft  nofile          80000
elastic hard  nofile      100000

Our user is called "elastic". The first line allows the ES JVM to lock as much memory as it wants to (memory mapped files count as memory as well!). When the user reaches the soft limit a warning is written to the log.

CU
Thomas

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Too many files open

q42jaap
Shay,
Maybe you could also update the docs:
http://www.elasticsearch.org/guide/reference/setup/installation.html
to point at some of the tutorials that mention the service wrapper. I almost tried to configure the servicewrapper with elasticsearch myself, but luckyly found your github project (https://github.com/elasticsearch/elasticsearch-servicewrapper).

Jaap

On Monday, October 15, 2012 6:34:47 PM UTC+2, Jaap Taal wrote:
Hi Shay,

in 0.19.9, the node stats don't seem to mention max open files desc limit.
Do you have documentation somewhere which request I have to make (with curl for example)?

Thanks,

Jaap

On Tuesday, March 27, 2012 2:56:59 PM UTC+2, kimchy wrote:
Also, check nodes info and nodes stats API, they provide information on the max open files desc limit, and hte current open files. Verify through the nodes info API that your setting has actually taken affect.

On Mon, Mar 26, 2012 at 4:10 PM, Thomas Peuss <[hidden email]> wrote:
Hi Thinus!

Am Montag, 26. März 2012 14:11:08 UTC+2 schrieb Thinus Prinsloo:
java.io.IOException: Too many open files
        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163)
        at org.elasticsearch.common.netty.channel.socket.nio.NioServerSocketPipelineSink$Boss.run(NioServerSocketPipelineSink.java:236)
        at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:102)
        at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:679)

You need to raise the "ulimit" for the user that is running ES. Depends on your Linux distribution where you have to do it. On RedHat you can add a file to  /etc/security/limits.d.

Ours looks like this:
elastic   -   memlock     unlimited
elastic soft  nofile          80000
elastic hard  nofile      100000

Our user is called "elastic". The first line allows the ES JVM to lock as much memory as it wants to (memory mapped files count as memory as well!). When the user reaches the soft limit a warning is written to the log.

CU
Thomas

--