unassigned replica shards, and an unused node

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

unassigned replica shards, and an unused node

James Bardin
Hi, I have a production cluster (0.19.8) with a number of replica shards that are unassigned. All primary shards are accounted for, but some no longer have redundancy (cluster health is yellow). There is also a node with no shards whatsoever, but seems to be perfectly fine otherwise, and removing that then rejoining it does nothing other than show as removed and added in the master's log. I aslo tried the "reroute" api, but that seems to be a noop on my version, as it just returns 200 and nothing happens.

Without replicas, I'm reluctant to simply restart other nodes just to see what happens. 

Is using the shutdown api (is it in .19.8?) supposed to force the shards to other nodes, or does it simply stop the jvm? 

Any other tips on how to procede?

Thanks.

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: unassigned replica shards, and an unused node

Radu Gheorghe-2
Hello James,

On Wed, Oct 31, 2012 at 3:53 PM, James Bardin <[hidden email]> wrote:

> Hi, I have a production cluster (0.19.8) with a number of replica shards
> that are unassigned. All primary shards are accounted for, but some no
> longer have redundancy (cluster health is yellow). There is also a node with
> no shards whatsoever, but seems to be perfectly fine otherwise, and removing
> that then rejoining it does nothing other than show as removed and added in
> the master's log. I aslo tried the "reroute" api, but that seems to be a
> noop on my version, as it just returns 200 and nothing happens.
>
> Without replicas, I'm reluctant to simply restart other nodes just to see
> what happens.
>
> Is using the shutdown api (is it in .19.8?) supposed to force the shards to
> other nodes, or does it simply stop the jvm?
>

The Shutdown API (available in 0.19.8) doesn't relocate the shards
before stopping the JVM. But Elasticsearch should automatically
redistribute replicas to your other nodes so that everything should be
OK eventually. Take a look here:
http://elasticsearch-users.115913.n3.nabble.com/how-do-I-relocate-shards-from-a-node-prior-to-shutting-it-down-td4024570.html

> Any other tips on how to procede?
>

Any clues in the logs of the empty node? If not, I would turn on
debugging and see it it brings out any new info.

Also, what is the state of your unallocated shards? Are they
initializing or simply "not allocated"?

Do you have any shard allocation settings defined?
http://www.elasticsearch.org/guide/reference/index-modules/allocation.html

I'd also try disabling replicas and enabling them again using the
Indices Update Settings API:
http://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings.html

I know, it sounds a lot like "did you try turning it off and on
again?", but who knows :)

Best regards,
Radu
--
http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

--


Reply | Threaded
Open this post in threaded view
|

Re: unassigned replica shards, and an unused node

James Bardin
On Thu, Nov 1, 2012 at 3:45 AM, Radu Gheorghe
<[hidden email]> wrote:

> The Shutdown API (available in 0.19.8) doesn't relocate the shards
> before stopping the JVM. But Elasticsearch should automatically
> redistribute replicas to your other nodes so that everything should be
> OK eventually. Take a look here:
> http://elasticsearch-users.115913.n3.nabble.com/how-do-I-relocate-shards-from-a-node-prior-to-shutting-it-down-td4024570.html
>

Ah, I hadn't seen that with the transient setting before. Wish I had.


> Any clues in the logs of the empty node? If not, I would turn on
> debugging and see it it brings out any new info.
>

Nothing in the logs that I can see. That node did pick up some shards
after restarting nodes which had redundant data though, so there's not
much else I can investigate there.

> Also, what is the state of your unallocated shards? Are they
> initializing or simply "not allocated"?
>

Yeah, I have one index now where all replicas are simply "UNASSIGNED".

> Do you have any shard allocation settings defined?
> http://www.elasticsearch.org/guide/reference/index-modules/allocation.html
>

Yes, it there are some routing.allocation settings on the index
missing its replicas, from when we had to push some indexes around a
while back. It's my hunch that this is related, as it's now the only
difference between indexes in the cluster. It now has an include.name
and include.tag settings. There are no more node tags, so that can't
match anything now, and I wonder if it's overriding the include.name.
I really want to just remove these settings now, but I haven't found
any way to do so without building a new index.

> I'd also try disabling replicas and enabling them again using the
> Indices Update Settings API:
> http://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings.html
>

Tried that, but new replicas just show up immediately as UNASSIGNED,
and nothing happens.


> I know, it sounds a lot like "did you try turning it off and on
> again?", but who knows :)
>

May have to resort to that. I could get in a point-release update too.

Thanks,
-james

--


Reply | Threaded
Open this post in threaded view
|

Re: unassigned replica shards, and an unused node

James Bardin
 The Shutdown API (available in 0.19.8) doesn't relocate the shards
>> before stopping the JVM. But Elasticsearch should automatically
>> redistribute replicas to your other nodes so that everything should be
>> OK eventually. Take a look here:
>> http://elasticsearch-users.115913.n3.nabble.com/how-do-I-relocate-shards-from-a-node-prior-to-shutting-it-down-td4024570.html
>>
>

Excluding all allocations didn't work for the index in question. All
other shard moved accordingly though. Nothing in the logs.


>
>> I know, it sounds a lot like "did you try turning it off and on
>> again?", but who knows :)
>>
>
> May have to resort to that. I could get in a point-release update too.

A rolling restart of the service did nothing for this index. It
totally disappeared during recovery, and then only the primaries came
back online. I'll plan on the 0.19.11 update asap, as I see there a
some code changes around index allocation.

--


Reply | Threaded
Open this post in threaded view
|

Re: unassigned replica shards, and an unused node

Igor Motov-3
Try setting include.name and include.tag to "*" for this index

On Thursday, November 1, 2012 2:38:38 PM UTC-4, James Bardin wrote:
The Shutdown API (available in 0.19.8) doesn't relocate the shards
>> before stopping the JVM. But Elasticsearch should automatically
>> redistribute replicas to your other nodes so that everything should be
>> OK eventually. Take a look here:
>> http://elasticsearch-users.115913.n3.nabble.com/how-do-I-relocate-shards-from-a-node-prior-to-shutting-it-down-td4024570.html
>>
>

Excluding all allocations didn't work for the index in question. All
other shard moved accordingly though. Nothing in the logs.


>
>> I know, it sounds a lot like "did you try turning it off and on
>> again?", but who knows :)
>>
>
> May have to resort to that. I could get in a point-release update too.

A rolling restart of the service did nothing for this index. It
totally disappeared during recovery, and then only the primaries came
back online. I'll plan on the 0.19.11 update asap, as I see there a
some code changes around index allocation.

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: unassigned replica shards, and an unused node

James Bardin
On Thu, Nov 1, 2012 at 4:58 PM, Igor Motov <[hidden email]> wrote:
> Try setting include.name and include.tag to "*" for this index

Tried that too with no results. I wonder if include.tag takes precedence.

I noticed that @kimchy recently changed the routing allocation code to
treat an empty string as not set -- hoping that alleviates the
problem. It would seem there has to be some way to remove the i.r.a
settings on this index (maybe the java api? I haven't checked it out
at all yet).

Thanks,
-james

--


Reply | Threaded
Open this post in threaded view
|

Re: unassigned replica shards, and an unused node

Shairon Toledo
It happened with us some times, the only way that I could to force shards allocation was by _id(ids separated  by comma), like this

    curl -XPUT localhost:9200/_cluster/settings -d '{
        "transient" : {
            "cluster.routing.allocation.include._id" : "YxZ92dZCTE2QTUHMrf9s5Q,eBSlwQ2QRNu4hs1mcjWGSQ,XelXrOfRTqmZiP1H_9X6uw",
            "cluster.routing.allocation.cluster_concurrent_rebalance": 10
        }
    }'

Or using by index

curl -XPUT location:9200/index_name/_settings -d '{
      "index.routing.allocation.include._id" : "eBSlwQ2QRNu4hs1mcjWGSQ"
  }'

The problem is, in case any node restart I need perform the PUT again.





On Thu, Nov 1, 2012 at 7:09 PM, James Bardin <[hidden email]> wrote:
On Thu, Nov 1, 2012 at 4:58 PM, Igor Motov <[hidden email]> wrote:
> Try setting include.name and include.tag to "*" for this index

Tried that too with no results. I wonder if include.tag takes precedence.

I noticed that @kimchy recently changed the routing allocation code to
treat an empty string as not set -- hoping that alleviates the
problem. It would seem there has to be some way to remove the i.r.a
settings on this index (maybe the java api? I haven't checked it out
at all yet).

Thanks,
-james

--





--











Shairon Toledo
http://hashcode.me

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: unassigned replica shards, and an unused node

James Bardin
On Thu, Nov 1, 2012 at 5:18 PM, Shairon Toledo <[hidden email]> wrote:
> curl -XPUT location:9200/index_name/_settings -d '{
>       "index.routing.allocation.include._id" : "eBSlwQ2QRNu4hs1mcjWGSQ"
>   }'

OK, using _id at the cluster level did nothing, but at the index
level, shards started relocating, and the replicas all started!

Not a permanent fix, but it seems there is a bug in the routing
allocation. I really want to get those settings out of this index.

--


Reply | Threaded
Open this post in threaded view
|

Re: unassigned replica shards, and an unused node

James Bardin
So 0.19.11 has mitigated the problem for us. I'm guessing issue #2229
[1] is what really helped, in that empty strings are now ignored. New
replicas are allocated as expected.


1. https://github.com/elasticsearch/elasticsearch/issues/2229

--


Reply | Threaded
Open this post in threaded view
|

Re: unassigned replica shards, and an unused node

Ivan Brusic
I missed this thread while I was debugging the same issues: https://groups.google.com/d/msg/elasticsearch/6aGAUDtNtWw/48RZW9YRZ1QJ

No allocation settings are working for me. Cluster is unusable as-is since shards can no longer be allocated. The whole point of my tests was to do a rolling upgrade, but now it appears that a full cluster restart is needed. A bit extreme.

-- 
Ivan

On Fri, Nov 2, 2012 at 10:33 AM, James Bardin <[hidden email]> wrote:
So 0.19.11 has mitigated the problem for us. I'm guessing issue #2229
[1] is what really helped, in that empty strings are now ignored. New
replicas are allocated as expected.


1. https://github.com/elasticsearch/elasticsearch/issues/2229

--



--