unicast not working after upgrade to 20.2

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
GX
Reply | Threaded
Open this post in threaded view
|

unicast not working after upgrade to 20.2

GX
Hi All

I upgraded my cluster from 19.8 to 20.2 and was getting spit brain confusion. here is what I did:
 in config/elasticsearch.yml I have the following (this is same before and after updrade
path.data: /mnt/sda2/data/
path.logs: /mnt/sda2/logs/elasticsearch
node.master: true
node.data: true
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["192.168.1.9", "192.168.1.8", "129.168.1.7", "192.168.1.6"]

so I stopped all nodes (running 19.8) upgraded to 20.2 then started all nodes (I assume this is what is referred to as 'a full cluster restart'?)
When I noticed high cpu usage for 30 minutes I investigated using bigdesk and noticed notes were in separate clusters

I did not find any reference in changelogs to unicast or multicast so find this behaviour strange. I had to revert back to 19.8 to bring things back to sanity.

Anyone have ant insight to this behaviour?

Btw the reason im using unicast is the network does not support it.

GX



--
 
 
GX
Reply | Threaded
Open this post in threaded view
|

Re: unicast not working after upgrade to 20.2

GX
 Ok so unicast and multicast is too scary for anyone to touch, but can someone please at least clarify what is meant by "a full cluster restart". Am I correct in assuming that all nodes need to be shut down, upgraded then started, or can I upgrade the nodes one by one while the others are still running to minimize downtime?

GX
Reply | Threaded
Open this post in threaded view
|

Re: unicast not working after upgrade to 20.2

dadoonet
IMHO a full cluster restart is a global shutdown, an upgrade of all cluster and a restart. So, yes, you will have downtime, but perhaps under the minute if you have prepared everything before...

HTH

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 25 janv. 2013 à 04:01, GX <[hidden email]> a écrit :

 Ok so unicast and multicast is too scary for anyone to touch, but can someone please at least clarify what is meant by "a full cluster restart". Am I correct in assuming that all nodes need to be shut down, upgraded then started, or can I upgrade the nodes one by one while the others are still running to minimize downtime?

GX

--


Reply | Threaded
Open this post in threaded view
|

Re: unicast not working after upgrade to 20.2

kimchy
Administrator
In reply to this post by GX
Did you manage to resolve this? Full cluster restart is restart all the nodes and start them with the new version. Unicast disco works the same as 0.19.

On Jan 25, 2013, at 4:01 AM, GX <[hidden email]> wrote:

>  Ok so unicast and multicast is too scary for anyone to touch, but can someone please at least clarify what is meant by "a full cluster restart". Am I correct in assuming that all nodes need to be shut down, upgraded then started, or can I upgrade the nodes one by one while the others are still running to minimize downtime?
>
> GX

GX
Reply | Threaded
Open this post in threaded view
|

Re: unicast not working after upgrade to 20.2

GX
Hi kimchy

Yes I managed to resolve this:
I think the problem causing the split brain may have been a wrong ip in the list, note the third element is 129 not 192

discovery.zen.ping.unicast.hosts: ["192.168.1.9", "192.168.1.8", "129.168.1.7", "192.168.1.6"]

after that I managed to get all nodes in one cluster, at some point one of the nodes was unresponsive, after I tried to restart it it kept failing to start, restarting other nodes also failed to start. I panicked and stopped all nodes swiched back to 19.8 and tried to start, again all nodes failed.. at this point people were asking some very serious questions...  well to cut a long story short I looked into the logs and it complained of special character not allowed in yml file, I found in one of the comments  a french character "a' la raid" I changed it to a normal "a" and managed to start all nodes (in 20.2) fine, status was green in a couple of minutes.

I cant understand why the conf file suddenly caused nodes to stop working, I can only assume when I fixed the ips the encoding was changed (probably to utf-8) and ES or java didnt like that.

All seems well now and things are running fine. I have been hounding the network administrators for an answer to if multicast is enabled and why its not possible but have not managed to get a strait coherent answer.

Regards

GX


On Sunday, January 27, 2013 12:34:02 PM UTC+2, kimchy wrote:
Did you manage to resolve this? Full cluster restart is restart all the nodes and start them with the new version. Unicast disco works the same as 0.19.

On Jan 25, 2013, at 4:01 AM, GX <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="bcLr59-pR1MJ">mail...@...> wrote:

>  Ok so unicast and multicast is too scary for anyone to touch, but can someone please at least clarify what is meant by "a full cluster restart". Am I correct in assuming that all nodes need to be shut down, upgraded then started, or can I upgrade the nodes one by one while the others are still running to minimize downtime?
>
> GX

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: unicast not working after upgrade to 20.2

Ivan Brusic
Are you using ElasticSearch on Windows? Wondering if you had the same issue: https://github.com/elasticsearch/elasticsearch/pull/2389

Problem should have been "fixed" on 0.20.2 though

-- 
Ivan

On Sun, Jan 27, 2013 at 2:51 AM, GX <[hidden email]> wrote:
well to cut a long story short I looked into the logs and it complained of special character not allowed in yml file, I found in one of the comments  a french character "a' la raid" I changed it to a normal "a" and managed to start all nodes (in 20.2) fine, status was green in a couple of minutes.

I cant understand why the conf file suddenly caused nodes to stop working, I can only assume when I fixed the ips the encoding was changed (probably to utf-8) and ES or java didnt like that.


GX
Reply | Threaded
Open this post in threaded view
|

Re: unicast not working after upgrade to 20.2

GX
Hi Ivan

No Im not using windows, Im using slackware (both development and production), but that is the exact error I had. I may have copied the 19.8 config yml file to 20.2 to keep my settings..

Thanks for the input

GX

On Sunday, January 27, 2013 6:36:33 PM UTC+2, Ivan Brusic wrote:
Are you using ElasticSearch on Windows? Wondering if you had the same issue: https://github.com/elasticsearch/elasticsearch/pull/2389

Problem should have been "fixed" on 0.20.2 though

-- 
Ivan

On Sun, Jan 27, 2013 at 2:51 AM, GX <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="sLpKcbEnaRwJ">mail...@...> wrote:
well to cut a long story short I looked into the logs and it complained of special character not allowed in yml file, I found in one of the comments  a french character "a' la raid" I changed it to a normal "a" and managed to start all nodes (in 20.2) fine, status was green in a couple of minutes.

I cant understand why the conf file suddenly caused nodes to stop working, I can only assume when I fixed the ips the encoding was changed (probably to utf-8) and ES or java didnt like that.


--