Perma-Unallocated primary shards after a node has left the cluster

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Perma-Unallocated primary shards after a node has left the cluster

Alex Schokking
Hi guys, I would really appreciate some help understanding what's going down with shard allocation in this case: 

Elasticsearch version: 1.4.4

We had 3 nodes with 1 shard and 1 replica per index (so net 2 copies of everything). 1 node went down and the cluster went red. It started to reallocate shards as expected and there were originally ~50 unallocated shards with 15 primary and the rest replicas. 

It's been a few hours now and there are still 15 outstanding shards that are all primary that don't seem to be getting re-allocated. I thought this would be a pretty standard scenario so I was really hoping I wouldn't need to manually walk through and re-allocate the primary shards, but I'm not sure what else to try at this point to get back to green. Any pointers would be really appreciated. Here is some of the relevant seeming bits folks asked about on the IRC:

In the ES logs for the unallocated index names there are lines along the line of 
[2015-04-29 22:08:22,803][DEBUG][action.admin.indices.stats] [Agent Axis] [webaccesslogs-2015.04.24][0], node[-r2iQnH4R-mcUy4NicCB5g], [P], s[STARTED]: failed to execute [org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@6a564a91] org.elasticsearch.transport.SendRequestTransportException: [Jean-Paul Beaubier][inet[/10.155.165.126:9300]][indices:monitor/stats[s]]
"Jean-Paul Beaubier" is the node that went down

_cat/recovery
shards disk.used disk.avail disk.total disk.percent host              ip             node        
   420    21.2gb       77gb     98.3gb           21 ip-10-234-164-148 10.234.164.148 Agent Axis  
   420      41gb     57.2gb     98.3gb           41 ip-10-218-145-237 10.218.145.237 Ebon Seeker 
    15                                                                               UNASSIGNED 

I'm trying to understand why it's stuck in this state given there is no other info in the logs as far as I can tell about why the shards can't be allocated. Shouldn't the replicas just be promoted in place to new primaries and then new replicas created on the other node?

Thanks and regards -- Alex 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9adda07d-88b0-4fa2-805b-37d4739d6f1a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Perma-Unallocated primary shards after a node has left the cluster

Alex Schokking
Probably super evident but the output above was actually from _cat/allocation?v not /recovery, sorry about that.

On Wednesday, April 29, 2015 at 5:19:08 PM UTC-7, Alex Schokking wrote:
Hi guys, I would really appreciate some help understanding what's going down with shard allocation in this case: 

Elasticsearch version: 1.4.4

We had 3 nodes with 1 shard and 1 replica per index (so net 2 copies of everything). 1 node went down and the cluster went red. It started to reallocate shards as expected and there were originally ~50 unallocated shards with 15 primary and the rest replicas. 

It's been a few hours now and there are still 15 outstanding shards that are all primary that don't seem to be getting re-allocated. I thought this would be a pretty standard scenario so I was really hoping I wouldn't need to manually walk through and re-allocate the primary shards, but I'm not sure what else to try at this point to get back to green. Any pointers would be really appreciated. Here is some of the relevant seeming bits folks asked about on the IRC:

In the ES logs for the unallocated index names there are lines along the line of 
[2015-04-29 22:08:22,803][DEBUG][action.admin.indices.stats] [Agent Axis] [webaccesslogs-2015.04.24][0], node[-r2iQnH4R-mcUy4NicCB5g], [P], s[STARTED]: failed to execute [org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@6a564a91] org.elasticsearch.transport.SendRequestTransportException: [Jean-Paul Beaubier][inet[/10.155.165.126:9300]][indices:monitor/stats[s]]
"Jean-Paul Beaubier" is the node that went down

_cat/recovery
shards disk.used disk.avail disk.total disk.percent host              ip             node        
   420    21.2gb       77gb     98.3gb           21 ip-10-234-164-148 10.234.164.148 Agent Axis  
   420      41gb     57.2gb     98.3gb           41 ip-10-218-145-237 10.218.145.237 Ebon Seeker 
    15                                                                               UNASSIGNED 

I'm trying to understand why it's stuck in this state given there is no other info in the logs as far as I can tell about why the shards can't be allocated. Shouldn't the replicas just be promoted in place to new primaries and then new replicas created on the other node?

Thanks and regards -- Alex 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/44f2f680-0560-448f-a19f-893fda5aab41%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.