warn which crashes server

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

warn which crashes server

Szymon Gwóźdź
Hi!

I've launched 4-machines ES cluster. 2 machines are data nodes, 2 other machines are non-data nodes.
I've started putting data (via prepared script) from 5 other machines. A lot of data (about 40 documents (~200 kB) coming from one machine per sec). About couple of minutes I get log like this:

[11:17:25,431][WARN ][transport.netty          ] [Cold War] Exception caught on netty layer [[id: 0x03435ec9, /10.1.49.90:49601 => fts-test-4/10.1.49.89:9300]]
org.elasticsearch.transport.ResponseHandlerNotFoundTransportException: Transport response handler not found of id [166613]
        at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:71)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:391)
        at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:317)
        at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:299)
        at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:345)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:332)
        at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:323)
        at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:275)
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:196)
        at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

And this server crashes (because other node gets "Master" position). Available memory is enough. What is the problem?

cheers
Szymon Gwóźdź
Reply | Threaded
Open this post in threaded view
|

Re: warn which crashes server

kimchy
Administrator
Sadly, this exception is caused by another problem that happened on another node. Which version are you running? If you can try with master (or wait a couple days for 0.7) that would be great.

cheers,
shay.banon

2010/5/11 Szymon Gwóźdź <[hidden email]>
Hi!

I've launched 4-machines ES cluster. 2 machines are data nodes, 2 other machines are non-data nodes.
I've started putting data (via prepared script) from 5 other machines. A lot of data (about 40 documents (~200 kB) coming from one machine per sec). About couple of minutes I get log like this:

[11:17:25,431][WARN ][transport.netty          ] [Cold War] Exception caught on netty layer [[id: 0x03435ec9, /10.1.49.90:49601 => fts-test-4/10.1.49.89:9300]]
org.elasticsearch.transport.ResponseHandlerNotFoundTransportException: Transport response handler not found of id [166613]
        at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:71)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:391)
        at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:317)
        at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:299)
        at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:345)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:332)
        at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:323)
        at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:275)
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:196)
        at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

And this server crashes (because other node gets "Master" position). Available memory is enough. What is the problem?

cheers
Szymon Gwóźdź

Reply | Threaded
Open this post in threaded view
|

Re: warn which crashes server

Szymon Gwóźdź
Hi!

I'm using master ( v0.6.0-132-g040030d )

cheers
Szymon Gwóźdź

W dniu 11 maja 2010 15:42 użytkownik Shay Banon <[hidden email]> napisał:
Sadly, this exception is caused by another problem that happened on another node. Which version are you running? If you can try with master (or wait a couple days for 0.7) that would be great.

cheers,
shay.banon

2010/5/11 Szymon Gwóźdź <[hidden email]>

Hi!

I've launched 4-machines ES cluster. 2 machines are data nodes, 2 other machines are non-data nodes.
I've started putting data (via prepared script) from 5 other machines. A lot of data (about 40 documents (~200 kB) coming from one machine per sec). About couple of minutes I get log like this:

[11:17:25,431][WARN ][transport.netty          ] [Cold War] Exception caught on netty layer [[id: 0x03435ec9, /10.1.49.90:49601 => fts-test-4/10.1.49.89:9300]]
org.elasticsearch.transport.ResponseHandlerNotFoundTransportException: Transport response handler not found of id [166613]
        at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:71)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:391)
        at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:317)
        at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:299)
        at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:345)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:332)
        at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:323)
        at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:275)
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:196)
        at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

And this server crashes (because other node gets "Master" position). Available memory is enough. What is the problem?

cheers
Szymon Gwóźdź


Reply | Threaded
Open this post in threaded view
|

Re: warn which crashes server

kimchy
Administrator
Master is on version 0.7 (which will be released in a couple of days), so its strange you see 0.6 ...

cheers,
shay.banon

2010/5/12 Szymon Gwóźdź <[hidden email]>
Hi!

I'm using master ( v0.6.0-132-g040030d )

cheers
Szymon Gwóźdź

W dniu 11 maja 2010 15:42 użytkownik Shay Banon <[hidden email]> napisał:

Sadly, this exception is caused by another problem that happened on another node. Which version are you running? If you can try with master (or wait a couple days for 0.7) that would be great.

cheers,
shay.banon

2010/5/11 Szymon Gwóźdź <[hidden email]>

Hi!

I've launched 4-machines ES cluster. 2 machines are data nodes, 2 other machines are non-data nodes.
I've started putting data (via prepared script) from 5 other machines. A lot of data (about 40 documents (~200 kB) coming from one machine per sec). About couple of minutes I get log like this:

[11:17:25,431][WARN ][transport.netty          ] [Cold War] Exception caught on netty layer [[id: 0x03435ec9, /10.1.49.90:49601 => fts-test-4/10.1.49.89:9300]]
org.elasticsearch.transport.ResponseHandlerNotFoundTransportException: Transport response handler not found of id [166613]
        at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:71)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:391)
        at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:317)
        at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:299)
        at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:345)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:332)
        at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:323)
        at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:275)
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:196)
        at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

And this server crashes (because other node gets "Master" position). Available memory is enough. What is the problem?

cheers
Szymon Gwóźdź



Reply | Threaded
Open this post in threaded view
|

Re: warn which crashes server

Szymon Gwóźdź
That's the name of downloaded file. Even now when I'm choosing downloading master the name of the downloaded file contains "v0.6.0")

W dniu 12 maja 2010 12:40 użytkownik Shay Banon <[hidden email]> napisał:
Master is on version 0.7 (which will be released in a couple of days), so its strange you see 0.6 ...

cheers,
shay.banon

2010/5/12 Szymon Gwóźdź <[hidden email]>

Hi!

I'm using master ( v0.6.0-132-g040030d )

cheers
Szymon Gwóźdź

W dniu 11 maja 2010 15:42 użytkownik Shay Banon <[hidden email]> napisał:

Sadly, this exception is caused by another problem that happened on another node. Which version are you running? If you can try with master (or wait a couple days for 0.7) that would be great.

cheers,
shay.banon

2010/5/11 Szymon Gwóźdź <[hidden email]>

Hi!

I've launched 4-machines ES cluster. 2 machines are data nodes, 2 other machines are non-data nodes.
I've started putting data (via prepared script) from 5 other machines. A lot of data (about 40 documents (~200 kB) coming from one machine per sec). About couple of minutes I get log like this:

[11:17:25,431][WARN ][transport.netty          ] [Cold War] Exception caught on netty layer [[id: 0x03435ec9, /10.1.49.90:49601 => fts-test-4/10.1.49.89:9300]]
org.elasticsearch.transport.ResponseHandlerNotFoundTransportException: Transport response handler not found of id [166613]
        at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:71)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:391)
        at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:317)
        at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:299)
        at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:345)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:332)
        at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:323)
        at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:275)
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:196)
        at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

And this server crashes (because other node gets "Master" position). Available memory is enough. What is the problem?

cheers
Szymon Gwóźdź




Reply | Threaded
Open this post in threaded view
|

Re: warn which crashes server

kimchy
Administrator
Just downloaded and I got a file named elasticsearch-elasticsearch-a0b25ec. Its from here: http://www.elasticsearch.com/download/master/. Also, the version is printed in the logs / console when elasticsearch starts up.

shay.banon

2010/5/12 Szymon Gwóźdź <[hidden email]>
That's the name of downloaded file. Even now when I'm choosing downloading master the name of the downloaded file contains "v0.6.0")

W dniu 12 maja 2010 12:40 użytkownik Shay Banon <[hidden email]> napisał:

Master is on version 0.7 (which will be released in a couple of days), so its strange you see 0.6 ...

cheers,
shay.banon

2010/5/12 Szymon Gwóźdź <[hidden email]>

Hi!

I'm using master ( v0.6.0-132-g040030d )

cheers
Szymon Gwóźdź

W dniu 11 maja 2010 15:42 użytkownik Shay Banon <[hidden email]> napisał:

Sadly, this exception is caused by another problem that happened on another node. Which version are you running? If you can try with master (or wait a couple days for 0.7) that would be great.

cheers,
shay.banon

2010/5/11 Szymon Gwóźdź <[hidden email]>

Hi!

I've launched 4-machines ES cluster. 2 machines are data nodes, 2 other machines are non-data nodes.
I've started putting data (via prepared script) from 5 other machines. A lot of data (about 40 documents (~200 kB) coming from one machine per sec). About couple of minutes I get log like this:

[11:17:25,431][WARN ][transport.netty          ] [Cold War] Exception caught on netty layer [[id: 0x03435ec9, /10.1.49.90:49601 => fts-test-4/10.1.49.89:9300]]
org.elasticsearch.transport.ResponseHandlerNotFoundTransportException: Transport response handler not found of id [166613]
        at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:71)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:391)
        at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:317)
        at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:299)
        at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:345)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:332)
        at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:323)
        at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:275)
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:196)
        at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

And this server crashes (because other node gets "Master" position). Available memory is enough. What is the problem?

cheers
Szymon Gwóźdź





Reply | Threaded
Open this post in threaded view
|

Re: warn which crashes server

Bart Schuller
In reply to this post by Szymon Gwóźdź

On May 12, 2010, at 13:10, Szymon Gwóźdź wrote:

> That's the name of downloaded file. Even now when I'm choosing downloading master the name of the downloaded file contains "v0.6.0")

That should be OK. a string like "v0.6.0-132-g040030d" is the output of "git describe", which shows the most recent tag (which is 0.6.0 because 0.7.0 hasn't been released yet), followed by the number of commits since then, followed by the short hash that identifies the latest commit.

--
[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: warn which crashes server

kimchy
Administrator
ok, so lets verify it, when you start the server, do you see 0.7 snapshot version?

2010/5/12 Bart Schuller <[hidden email]>

On May 12, 2010, at 13:10, Szymon Gwóźdź wrote:

> That's the name of downloaded file. Even now when I'm choosing downloading master the name of the downloaded file contains "v0.6.0")

That should be OK. a string like "v0.6.0-132-g040030d" is the output of "git describe", which shows the most recent tag (which is 0.6.0 because 0.7.0 hasn't been released yet), followed by the number of commits since then, followed by the short hash that identifies the latest commit.

--
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: warn which crashes server

Szymon Gwóźdź
Yes, I used 0.7 snapshot version

W dniu 12 maja 2010 14:25 użytkownik Shay Banon <[hidden email]> napisał:
ok, so lets verify it, when you start the server, do you see 0.7 snapshot version?

2010/5/12 Bart Schuller <[hidden email]>


On May 12, 2010, at 13:10, Szymon Gwóźdź wrote:

> That's the name of downloaded file. Even now when I'm choosing downloading master the name of the downloaded file contains "v0.6.0")

That should be OK. a string like "v0.6.0-132-g040030d" is the output of "git describe", which shows the most recent tag (which is 0.6.0 because 0.7.0 hasn't been released yet), followed by the number of commits since then, followed by the short hash that identifies the latest commit.

--
[hidden email]



Reply | Threaded
Open this post in threaded view
|

Re: warn which crashes server

kimchy
Administrator
ok, can you send me the logs so I will have a look?

2010/5/12 Szymon Gwóźdź <[hidden email]>
Yes, I used 0.7 snapshot version

W dniu 12 maja 2010 14:25 użytkownik Shay Banon <[hidden email]> napisał:

ok, so lets verify it, when you start the server, do you see 0.7 snapshot version?

2010/5/12 Bart Schuller <[hidden email]>


On May 12, 2010, at 13:10, Szymon Gwóźdź wrote:

> That's the name of downloaded file. Even now when I'm choosing downloading master the name of the downloaded file contains "v0.6.0")

That should be OK. a string like "v0.6.0-132-g040030d" is the output of "git describe", which shows the most recent tag (which is 0.6.0 because 0.7.0 hasn't been released yet), followed by the number of commits since then, followed by the short hash that identifies the latest commit.

--
[hidden email]




Reply | Threaded
Open this post in threaded view
|

Re: warn which crashes server

Szymon Gwóźdź
logs (from all 4 nodes) in attachment

W dniu 12 maja 2010 14:39 użytkownik Shay Banon <[hidden email]> napisał:
ok, can you send me the logs so I will have a look?

2010/5/12 Szymon Gwóźdź <[hidden email]>
Yes, I used 0.7 snapshot version


W dniu 12 maja 2010 14:25 użytkownik Shay Banon <[hidden email]> napisał:

ok, so lets verify it, when you start the server, do you see 0.7 snapshot version?

2010/5/12 Bart Schuller <[hidden email]>


On May 12, 2010, at 13:10, Szymon Gwóźdź wrote:

> That's the name of downloaded file. Even now when I'm choosing downloading master the name of the downloaded file contains "v0.6.0")

That should be OK. a string like "v0.6.0-132-g040030d" is the output of "git describe", which shows the most recent tag (which is 0.6.0 because 0.7.0 hasn't been released yet), followed by the number of commits since then, followed by the short hash that identifies the latest commit.

--
[hidden email]






1.log (325K) Download Attachment
2.log (168K) Download Attachment
3.log (1M) Download Attachment
4.log (258K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: warn which crashes server

kimchy
Administrator
Can you send the configuration you use as well? I think you have a misconfiguration in the gateway...

2010/5/12 Szymon Gwóźdź <[hidden email]>
logs (from all 4 nodes) in attachment

W dniu 12 maja 2010 14:39 użytkownik Shay Banon <[hidden email]> napisał:

ok, can you send me the logs so I will have a look?

2010/5/12 Szymon Gwóźdź <[hidden email]>
Yes, I used 0.7 snapshot version


W dniu 12 maja 2010 14:25 użytkownik Shay Banon <[hidden email]> napisał:

ok, so lets verify it, when you start the server, do you see 0.7 snapshot version?

2010/5/12 Bart Schuller <[hidden email]>


On May 12, 2010, at 13:10, Szymon Gwóźdź wrote:

> That's the name of downloaded file. Even now when I'm choosing downloading master the name of the downloaded file contains "v0.6.0")

That should be OK. a string like "v0.6.0-132-g040030d" is the output of "git describe", which shows the most recent tag (which is 0.6.0 because 0.7.0 hasn't been released yet), followed by the number of commits since then, followed by the short hash that identifies the latest commit.

--
[hidden email]






Reply | Threaded
Open this post in threaded view
|

Re: warn which crashes server

kimchy
Administrator
Also, can you try the latest master? I think I have fixed a problem where the master was detected as failed without waiting for the complete timeout for it (it basically pings with a timeout of 6 seconds for 5 times till it is decided that it has failed).

Another thing that I saw is that you create several indices. Can you check if the JVMs are starting to struggle when it comes to memory? Are you familiar with Java, can you open a visualvm or jconsole against them to check it (the data nodes are the interesting ones).

Last, I need that configuration, I think it has wrong configuration for the gateway.

cheers,
shay.banon

2010/5/12 Shay Banon <[hidden email]>
Can you send the configuration you use as well? I think you have a misconfiguration in the gateway...


2010/5/12 Szymon Gwóźdź <[hidden email]>
logs (from all 4 nodes) in attachment

W dniu 12 maja 2010 14:39 użytkownik Shay Banon <[hidden email]> napisał:

ok, can you send me the logs so I will have a look?

2010/5/12 Szymon Gwóźdź <[hidden email]>
Yes, I used 0.7 snapshot version


W dniu 12 maja 2010 14:25 użytkownik Shay Banon <[hidden email]> napisał:

ok, so lets verify it, when you start the server, do you see 0.7 snapshot version?

2010/5/12 Bart Schuller <[hidden email]>


On May 12, 2010, at 13:10, Szymon Gwóźdź wrote:

> That's the name of downloaded file. Even now when I'm choosing downloading master the name of the downloaded file contains "v0.6.0")

That should be OK. a string like "v0.6.0-132-g040030d" is the output of "git describe", which shows the most recent tag (which is 0.6.0 because 0.7.0 hasn't been released yet), followed by the number of commits since then, followed by the short hash that identifies the latest commit.

--
[hidden email]







Reply | Threaded
Open this post in threaded view
|

Re: warn which crashes server

kimchy
Administrator
I just fixed a major bug which caused the transport layer in elasticsearch to stop working (against a specific node): http://github.com/elasticsearch/elasticsearch/issues/#issue/170. This might explain the reason things stopped working completely for you.

I still would like to look at your configuration regarding the gateway.

cheers,
shay.banon

2010/5/12 Shay Banon <[hidden email]>
Also, can you try the latest master? I think I have fixed a problem where the master was detected as failed without waiting for the complete timeout for it (it basically pings with a timeout of 6 seconds for 5 times till it is decided that it has failed).

Another thing that I saw is that you create several indices. Can you check if the JVMs are starting to struggle when it comes to memory? Are you familiar with Java, can you open a visualvm or jconsole against them to check it (the data nodes are the interesting ones).

Last, I need that configuration, I think it has wrong configuration for the gateway.

cheers,
shay.banon

2010/5/12 Shay Banon <[hidden email]>

Can you send the configuration you use as well? I think you have a misconfiguration in the gateway...


2010/5/12 Szymon Gwóźdź <[hidden email]>
logs (from all 4 nodes) in attachment

W dniu 12 maja 2010 14:39 użytkownik Shay Banon <[hidden email]> napisał:

ok, can you send me the logs so I will have a look?

2010/5/12 Szymon Gwóźdź <[hidden email]>
Yes, I used 0.7 snapshot version


W dniu 12 maja 2010 14:25 użytkownik Shay Banon <[hidden email]> napisał:

ok, so lets verify it, when you start the server, do you see 0.7 snapshot version?

2010/5/12 Bart Schuller <[hidden email]>


On May 12, 2010, at 13:10, Szymon Gwóźdź wrote:

> That's the name of downloaded file. Even now when I'm choosing downloading master the name of the downloaded file contains "v0.6.0")

That should be OK. a string like "v0.6.0-132-g040030d" is the output of "git describe", which shows the most recent tag (which is 0.6.0 because 0.7.0 hasn't been released yet), followed by the number of commits since then, followed by the short hash that identifies the latest commit.

--
[hidden email]








Reply | Threaded
Open this post in threaded view
|

Re: warn which crashes server

Szymon Gwóźdź
Hi!

This is configuration I used:

cluster:
    name:    fts060

node:
    data:    true

http:
    enabled:    false
   
path:
    work:    /var/es-test/

discovery:
    jgroups:
        config:    tcp
        bind_port:    9700
        tcpping:
            initial_hosts: 10.1.49.90[9700], 10.1.49.91[9700], 10.1.49.88[9700], 10.1.49.89[9700]

gateway:
    type: fs
    fs:
        location: /mnt/storage0/es-test

index:
    gateway:
        type: fs
        fs:
            location: /mnt/storage0/es-test

There is another problem also in this situation. While putting data simultanously from few machines (and about 20 threads putting data per machine) ES after some time (5-15 minutes) behave weird: from to time server is not responding immediately to requests, but has 5-60 seconds of "no-responding" to any PUT request. Changing threadpool.cached.scheduled_size from 20 to 100 doesn't change anything.

cheers,
Szymon Gwóźdź

W dniu 12 maja 2010 21:01 użytkownik Shay Banon <[hidden email]> napisał:
I just fixed a major bug which caused the transport layer in elasticsearch to stop working (against a specific node): http://github.com/elasticsearch/elasticsearch/issues/#issue/170. This might explain the reason things stopped working completely for you.

I still would like to look at your configuration regarding the gateway.

cheers,
shay.banon

2010/5/12 Shay Banon <[hidden email]>
Also, can you try the latest master? I think I have fixed a problem where the master was detected as failed without waiting for the complete timeout for it (it basically pings with a timeout of 6 seconds for 5 times till it is decided that it has failed).

Another thing that I saw is that you create several indices. Can you check if the JVMs are starting to struggle when it comes to memory? Are you familiar with Java, can you open a visualvm or jconsole against them to check it (the data nodes are the interesting ones).

Last, I need that configuration, I think it has wrong configuration for the gateway.

cheers,
shay.banon

2010/5/12 Shay Banon <[hidden email]>

Can you send the configuration you use as well? I think you have a misconfiguration in the gateway...


2010/5/12 Szymon Gwóźdź <[hidden email]>
logs (from all 4 nodes) in attachment

W dniu 12 maja 2010 14:39 użytkownik Shay Banon <[hidden email]> napisał:

ok, can you send me the logs so I will have a look?

2010/5/12 Szymon Gwóźdź <[hidden email]>
Yes, I used 0.7 snapshot version


W dniu 12 maja 2010 14:25 użytkownik Shay Banon <[hidden email]> napisał:

ok, so lets verify it, when you start the server, do you see 0.7 snapshot version?

2010/5/12 Bart Schuller <[hidden email]>


On May 12, 2010, at 13:10, Szymon Gwóźdź wrote:

> That's the name of downloaded file. Even now when I'm choosing downloading master the name of the downloaded file contains "v0.6.0")

That should be OK. a string like "v0.6.0-132-g040030d" is the output of "git describe", which shows the most recent tag (which is 0.6.0 because 0.7.0 hasn't been released yet), followed by the number of commits since then, followed by the short hash that identifies the latest commit.

--
[hidden email]









Reply | Threaded
Open this post in threaded view
|

Re: warn which crashes server

kimchy
Administrator
So, your configuration is not correct for the gateway. You should remove the index gateway configuration part, since in this case, *all* indices will use the same location. If you remove it, the indices will automatically use the fs gateway since it is configured on the gateway itself, and each will have its own location. With the configuration you have now, each index will override the other one.

Regarding the pauses, I think that you are overloading elasticsearch too much with all the indices you create. Can you monitor the garbage collection on the JVM (using jconsole or visualvm)?

Shay

2010/5/13 Szymon Gwóźdź <[hidden email]>
Hi!

This is configuration I used:

cluster:
    name:    fts060

node:
    data:    true

http:
    enabled:    false
   
path:
    work:    /var/es-test/

discovery:
    jgroups:
        config:    tcp
        bind_port:    9700
        tcpping:
            initial_hosts: 10.1.49.90[9700], 10.1.49.91[9700], 10.1.49.88[9700], 10.1.49.89[9700]

gateway:
    type: fs
    fs:
        location: /mnt/storage0/es-test

index:
    gateway:
        type: fs
        fs:
            location: /mnt/storage0/es-test

There is another problem also in this situation. While putting data simultanously from few machines (and about 20 threads putting data per machine) ES after some time (5-15 minutes) behave weird: from to time server is not responding immediately to requests, but has 5-60 seconds of "no-responding" to any PUT request. Changing threadpool.cached.scheduled_size from 20 to 100 doesn't change anything.

cheers,
Szymon Gwóźdź

W dniu 12 maja 2010 21:01 użytkownik Shay Banon <[hidden email]> napisał:

I just fixed a major bug which caused the transport layer in elasticsearch to stop working (against a specific node): http://github.com/elasticsearch/elasticsearch/issues/#issue/170. This might explain the reason things stopped working completely for you.

I still would like to look at your configuration regarding the gateway.

cheers,
shay.banon

2010/5/12 Shay Banon <[hidden email]>
Also, can you try the latest master? I think I have fixed a problem where the master was detected as failed without waiting for the complete timeout for it (it basically pings with a timeout of 6 seconds for 5 times till it is decided that it has failed).

Another thing that I saw is that you create several indices. Can you check if the JVMs are starting to struggle when it comes to memory? Are you familiar with Java, can you open a visualvm or jconsole against them to check it (the data nodes are the interesting ones).

Last, I need that configuration, I think it has wrong configuration for the gateway.

cheers,
shay.banon

2010/5/12 Shay Banon <[hidden email]>

Can you send the configuration you use as well? I think you have a misconfiguration in the gateway...


2010/5/12 Szymon Gwóźdź <[hidden email]>
logs (from all 4 nodes) in attachment

W dniu 12 maja 2010 14:39 użytkownik Shay Banon <[hidden email]> napisał:

ok, can you send me the logs so I will have a look?

2010/5/12 Szymon Gwóźdź <[hidden email]>
Yes, I used 0.7 snapshot version


W dniu 12 maja 2010 14:25 użytkownik Shay Banon <[hidden email]> napisał:

ok, so lets verify it, when you start the server, do you see 0.7 snapshot version?

2010/5/12 Bart Schuller <[hidden email]>


On May 12, 2010, at 13:10, Szymon Gwóźdź wrote:

> That's the name of downloaded file. Even now when I'm choosing downloading master the name of the downloaded file contains "v0.6.0")

That should be OK. a string like "v0.6.0-132-g040030d" is the output of "git describe", which shows the most recent tag (which is 0.6.0 because 0.7.0 hasn't been released yet), followed by the number of commits since then, followed by the short hash that identifies the latest commit.

--
[hidden email]