unassigned primary *and* replica shards

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

unassigned primary *and* replica shards

Matthias Johnson
Greetings, I have an 8 node cluster with 4 nodes each in different data centers. Recently the cluster became partitioned (i.e. couldn't see each other) due to firewalls.

I have removed the firewall issue and all nodes are once again showing to be part of the cluster, but there are 8 indexes, which show pairs of unassigned primary and replica shards (so shard 3 primary and replica of the same index for example).

{
 
"cluster_name" : "xxxxxxxx",
 
"status" : "red",
 
"timed_out" : false,
 
"number_of_nodes" : 8,
 
"number_of_data_nodes" : 8,
 
"active_primary_shards" : 500,
 
"active_shards" : 1000,
 
"relocating_shards" : 0,
 
"initializing_shards" : 0,
 
"unassigned_shards" : 16
}


I've looked into the reroute API and tried the following:

curl XPOST -s 'http://localhost:9200/_cluster/reroute?pretty=true' -d '{ "commands" : [ { "allocate" : {
              "index" : "0db5bb70b5d19bfceb5cebd5878fbcb905787745", "shard" : 3 , "node" : "_9FezNn4TgWjftFeKBAMJw", "allow_primary" : 1 }
        }
    ]
}'


which seems to return, but fails to allocate the shard. If I leave the "allow_primary" off I get an error as I would expect from reading the above API notes.

The question is this:

How do I recover from this failure and assign the unassigned shards which are in pairs of primary and replica?

Any help would be greatly appreciated.

\@matthias

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: unassigned primary *and* replica shards

Matthias Johnson
I should also mention that we are running version: 0.20.2

\@matthias

On Friday, March 22, 2013 10:56:36 AM UTC-6, Matthias Johnson wrote:
Greetings, I have an 8 node cluster with 4 nodes each in different data centers. Recently the cluster became partitioned (i.e. couldn't see each other) due to firewalls.

I have removed the firewall issue and all nodes are once again showing to be part of the cluster, but there are 8 indexes, which show pairs of unassigned primary and replica shards (so shard 3 primary and replica of the same index for example).

{
 
"cluster_name" : "xxxxxxxx",
 
"status" : "red",
 
"timed_out" : false,
 
"number_of_nodes" : 8,
 
"number_of_data_nodes" : 8,
 
"active_primary_shards" : 500,
 
"active_shards" : 1000,
 
"relocating_shards" : 0,
 
"initializing_shards" : 0,
 
"unassigned_shards" : 16
}


I've looked into the reroute API and tried the following:

curl XPOST -s 'http://localhost:9200/_cluster/reroute?pretty=true' -d '{ "commands" : [ { "allocate" : {
              "index" : "0db5bb70b5d19bfceb5cebd5878fbcb905787745", "shard" : 3 , "node" : "_9FezNn4TgWjftFeKBAMJw", "allow_primary" : 1 }
        }
    ]
}'


which seems to return, but fails to allocate the shard. If I leave the "allow_primary" off I get an error as I would expect from reading the above API notes.

The question is this:

How do I recover from this failure and assign the unassigned shards which are in pairs of primary and replica?

Any help would be greatly appreciated.

\@matthias

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: unassigned primary *and* replica shards

Matthias Johnson
In reply to this post by Matthias Johnson
Well, I managed to get our cluster back to all green status.

I did this by leaning on the karmi's excellent tire library.

Essentially I re-indexed the indexes in error to a new one, deleting the old one, re-index back to the original name.

Here is the super brief ruby snippet:

#!/usr/bin/ruby

require 'rubygems'
require 'tire'

Tire.configure do
        url
"http://localhost:9200"
end

Tire.index('broken').reindex 'broken-recover'


When that finished and I was sure it looked good I simply modified the code to reverse the index names and re-ran.

I'm not sure if this is the most graceful solution, but it seemed to work in this case, with vary small indexes (i.e. just recently created).

Cheers,

\@matthias
 

On Friday, March 22, 2013 10:56:36 AM UTC-6, Matthias Johnson wrote:
Greetings, I have an 8 node cluster with 4 nodes each in different data centers. Recently the cluster became partitioned (i.e. couldn't see each other) due to firewalls.

I have removed the firewall issue and all nodes are once again showing to be part of the cluster, but there are 8 indexes, which show pairs of unassigned primary and replica shards (so shard 3 primary and replica of the same index for example).

{
 
"cluster_name" : "xxxxxxxx",
 
"status" : "red",
 
"timed_out" : false,
 
"number_of_nodes" : 8,
 
"number_of_data_nodes" : 8,
 
"active_primary_shards" : 500,
 
"active_shards" : 1000,
 
"relocating_shards" : 0,
 
"initializing_shards" : 0,
 
"unassigned_shards" : 16
}


I've looked into the reroute API and tried the following:

curl XPOST -s 'http://localhost:9200/_cluster/reroute?pretty=true' -d '{ "commands" : [ { "allocate" : {
              "index" : "0db5bb70b5d19bfceb5cebd5878fbcb905787745", "shard" : 3 , "node" : "_9FezNn4TgWjftFeKBAMJw", "allow_primary" : 1 }
        }
    ]
}'


which seems to return, but fails to allocate the shard. If I leave the "allow_primary" off I get an error as I would expect from reading the above API notes.

The question is this:

How do I recover from this failure and assign the unassigned shards which are in pairs of primary and replica?

Any help would be greatly appreciated.

\@matthias

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: unassigned primary *and* replica shards

q42jaap
You can use aliases to skip that last re-index, we're using that a lot
(we add timestamps to indexnames to indicate which index was latest).
Using the alias you can even swap from the old to the new and revert
back if needed.
Jaap Taal

[ Q42 BV | tel 070 44523 42 | direct 070 44523 65 | http://q42.nl |
Waldorpstraat 17F, Den Haag | Vijzelstraat 72 unit 4.23, Amsterdam |
KvK 30164662 ]


On Mon, Mar 25, 2013 at 5:17 PM, Matthias Johnson <[hidden email]> wrote:

> Well, I managed to get our cluster back to all green status.
>
> I did this by leaning on the karmi's excellent tire library.
>
> Essentially I re-indexed the indexes in error to a new one, deleting the old
> one, re-index back to the original name.
>
> Here is the super brief ruby snippet:
>
> #!/usr/bin/ruby
>
> require 'rubygems'
> require 'tire'
>
> Tire.configure do
>         url "http://localhost:9200"
> end
>
> Tire.index('broken').reindex 'broken-recover'
>
>
> When that finished and I was sure it looked good I simply modified the code
> to reverse the index names and re-ran.
>
> I'm not sure if this is the most graceful solution, but it seemed to work in
> this case, with vary small indexes (i.e. just recently created).
>
> Cheers,
>
> \@matthias
>
>
> On Friday, March 22, 2013 10:56:36 AM UTC-6, Matthias Johnson wrote:
>>
>> Greetings, I have an 8 node cluster with 4 nodes each in different data
>> centers. Recently the cluster became partitioned (i.e. couldn't see each
>> other) due to firewalls.
>>
>> I have removed the firewall issue and all nodes are once again showing to
>> be part of the cluster, but there are 8 indexes, which show pairs of
>> unassigned primary and replica shards (so shard 3 primary and replica of the
>> same index for example).
>>
>> {
>>   "cluster_name" : "xxxxxxxx",
>>   "status" : "red",
>>   "timed_out" : false,
>>   "number_of_nodes" : 8,
>>   "number_of_data_nodes" : 8,
>>   "active_primary_shards" : 500,
>>   "active_shards" : 1000,
>>   "relocating_shards" : 0,
>>   "initializing_shards" : 0,
>>   "unassigned_shards" : 16
>> }
>>
>>
>> I've looked into the reroute API and tried the following:
>>
>> curl XPOST -s 'http://localhost:9200/_cluster/reroute?pretty=true' -d '{
>> "commands" : [ { "allocate" : {
>>               "index" : "0db5bb70b5d19bfceb5cebd5878fbcb905787745",
>> "shard" : 3 , "node" : "_9FezNn4TgWjftFeKBAMJw", "allow_primary" : 1 }
>>         }
>>     ]
>> }'
>>
>>
>> which seems to return, but fails to allocate the shard. If I leave the
>> "allow_primary" off I get an error as I would expect from reading the above
>> API notes.
>>
>> The question is this:
>>
>> How do I recover from this failure and assign the unassigned shards which
>> are in pairs of primary and replica?
>>
>> Any help would be greatly appreciated.
>>
>> \@matthias
>>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [hidden email].
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


Reply | Threaded
Open this post in threaded view
|

Re: unassigned primary *and* replica shards

Matthias Johnson
In reply to this post by Matthias Johnson
Jaap, thanks for that idea. I'll keep that in mind for the future!

\@matthias

On Friday, March 22, 2013 10:56:36 AM UTC-6, Matthias Johnson wrote:
Greetings, I have an 8 node cluster with 4 nodes each in different data centers. Recently the cluster became partitioned (i.e. couldn't see each other) due to firewalls.

I have removed the firewall issue and all nodes are once again showing to be part of the cluster, but there are 8 indexes, which show pairs of unassigned primary and replica shards (so shard 3 primary and replica of the same index for example).

{
 
"cluster_name" : "xxxxxxxx",
 
"status" : "red",
 
"timed_out" : false,
 
"number_of_nodes" : 8,
 
"number_of_data_nodes" : 8,
 
"active_primary_shards" : 500,
 
"active_shards" : 1000,
 
"relocating_shards" : 0,
 
"initializing_shards" : 0,
 
"unassigned_shards" : 16
}


I've looked into the reroute API and tried the following:

curl XPOST -s 'http://localhost:9200/_cluster/reroute?pretty=true' -d '{ "commands" : [ { "allocate" : {
              "index" : "0db5bb70b5d19bfceb5cebd5878fbcb905787745", "shard" : 3 , "node" : "_9FezNn4TgWjftFeKBAMJw", "allow_primary" : 1 }
        }
    ]
}'


which seems to return, but fails to allocate the shard. If I leave the "allow_primary" off I get an error as I would expect from reading the above API notes.

The question is this:

How do I recover from this failure and assign the unassigned shards which are in pairs of primary and replica?

Any help would be greatly appreciated.

\@matthias

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: unassigned primary *and* replica shards

Jeff Miller
We just had a very similar problem yesterday with 19.10, in a case where machines in the cluster became unhealthy and had to be shut down. In a couple of cases, after we'd reset the nodes, both a primary and secondary shard needed to be recovered. The system seemed unable to do so successfully.

We tried manual recovery with allow_primary, as in Matthias' example, and also found that it had no effect.

The only solution we were able to find was to recreate the index, although we did it without the tire library; we used aliasing as Jaap suggests to make it easier to cut over once the new index was ready.

Jeff Miller

On Monday, March 25, 2013 10:32:03 AM UTC-7, Matthias Johnson wrote:
Jaap, thanks for that idea. I'll keep that in mind for the future!

\@matthias

On Friday, March 22, 2013 10:56:36 AM UTC-6, Matthias Johnson wrote:
Greetings, I have an 8 node cluster with 4 nodes each in different data centers. Recently the cluster became partitioned (i.e. couldn't see each other) due to firewalls.

I have removed the firewall issue and all nodes are once again showing to be part of the cluster, but there are 8 indexes, which show pairs of unassigned primary and replica shards (so shard 3 primary and replica of the same index for example).

{
 
"cluster_name" : "xxxxxxxx",
 
"status" : "red",
 
"timed_out" : false,
 
"number_of_nodes" : 8,
 
"number_of_data_nodes" : 8,
 
"active_primary_shards" : 500,
 
"active_shards" : 1000,
 
"relocating_shards" : 0,
 
"initializing_shards" : 0,
 
"unassigned_shards" : 16
}


I've looked into the reroute API and tried the following:

curl XPOST -s 'http://localhost:9200/_cluster/reroute?pretty=true' -d '{ "commands" : [ { "allocate" : {
              "index" : "0db5bb70b5d19bfceb5cebd5878fbcb905787745", "shard" : 3 , "node" : "_9FezNn4TgWjftFeKBAMJw", "allow_primary" : 1 }
        }
    ]
}'


which seems to return, but fails to allocate the shard. If I leave the "allow_primary" off I get an error as I would expect from reading the above API notes.

The question is this:

How do I recover from this failure and assign the unassigned shards which are in pairs of primary and replica?

Any help would be greatly appreciated.

\@matthias

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.