Restarting of node taking much time

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Restarting of node taking much time

Ankit Jain
Hi All,

We have deployed 5 nodes cluster and each node is serving around 200 shards (total number of indices are 200 and each index has 200 shards).

While restarting each node is taking around 20 to 30 minutes to move all shards from unassigned state to assigned state.

How we can quickly move all the shards from unassigned state to assigned state?

Is the recovery time dependent on data size?

Regards,
Ankit Jain

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Restarting of node taking much time

Ivan Brusic
Are the restarts planned or are they server crashes? If they are planned, you should disabling indexing (if possible), flush the index and temporarily disable allocation.

Here is a repost of something I wrote two days ago:

Elasticsearch has been throttling I/O recovery since version 0.90. The defaults are fairly low. Trying increasing the indices.recovery.max_bytes_per_sec setting. 


You can also increase the number of shards that are recovered at the same time. The default is 2. Increase either value too much and you will have long IO waits.

Cheers,

Ivan


On Wed, Oct 9, 2013 at 5:06 AM, Ankit Jain <[hidden email]> wrote:
Hi All,

We have deployed 5 nodes cluster and each node is serving around 200 shards (total number of indices are 200 and each index has 200 shards).

While restarting each node is taking around 20 to 30 minutes to move all shards from unassigned state to assigned state.

How we can quickly move all the shards from unassigned state to assigned state?

Is the recovery time dependent on data size?

Regards,
Ankit Jain

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Restarting of node taking much time

Ankit Jain
Thanks Ivan.
 
We are planning to store TB's of data into ElasticSearch cluster. I don't think indices.recovery.max_bytes_per_sec would going to help me much.
 
Is any specific files which are required ES node during restart or  any specific metadata information required during restart?
 
Also, can you suggest some alternative ways to improve node recovery time?
 
Regards,
Ankit Jain
 
On Wednesday, 9 October 2013 20:15:53 UTC+5:30, Ivan Brusic wrote:
Are the restarts planned or are they server crashes? If they are planned, you should disabling indexing (if possible), flush the index and temporarily disable allocation.

Here is a repost of something I wrote two days ago:

Elasticsearch has been throttling I/O recovery since version 0.90. The defaults are fairly low. Trying increasing the indices.recovery.max_bytes_per_sec setting. 


You can also increase the number of shards that are recovered at the same time. The default is 2. Increase either value too much and you will have long IO waits.

Cheers,

Ivan


On Wed, Oct 9, 2013 at 5:06 AM, Ankit Jain <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="XwGJF9dd8HoJ">ankitj...@...> wrote:
Hi All,

We have deployed 5 nodes cluster and each node is serving around 200 shards (total number of indices are 200 and each index has 200 shards).

While restarting each node is taking around 20 to 30 minutes to move all shards from unassigned state to assigned state.

How we can quickly move all the shards from unassigned state to assigned state?

Is the recovery time dependent on data size?

Regards,
Ankit Jain

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="XwGJF9dd8HoJ">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Restarting of node taking much time

joergprante@gmail.com
You have decided to put 200 shards on a single node. Such a high number combined with a significant shard size can not be recovered very quick. Although you can put many thousands shards on a single node and shards can grow into many hundreds of GB, you always have a price to pay when the shards start moving around between nodes at recovery time.

The improvement depends on how much you want to stress a node while recovering. The default settings are chosen wisely so when a node comes up, it can always respond to search and index requests immediately while recovery. It does not confuse clients with timeouts, and it does not confuse sysadmins with iowaits. But there is a long down time, and this down time is directly configured by the number of shards per node and the current shard size.

So if you want to stress your nodes, you can select a higher value in cluster.routing.allocation.node_concurrent_recoveries. 


Pros:
- recovery may be faster

Cons
- recovery takes many network resources
- queries and indexing may not respond in time
- higher iowaits
- network bandwidth must be available

The best method to achieve quick recovery is to select a wise shards per node ratio and a sane shard size. 

Here is my approach: on my 32-core machines, I plan to never run more than 32 shards, and no shard shall grow beyond 5-10g, so transporting on a 10GBit/s takes reasonable time.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Restarting of node taking much time

shadyabhi
In reply to this post by Ankit Jain
How are you restarting your nodes? Are you using via init scripts or
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-shutdown.html
? Using the API seems to do pretty fast recovery for me.

On Wed, Oct 9, 2013 at 5:36 PM, Ankit Jain <[hidden email]> wrote:

> Hi All,
>
> We have deployed 5 nodes cluster and each node is serving around 200 shards
> (total number of indices are 200 and each index has 200 shards).
>
> While restarting each node is taking around 20 to 30 minutes to move all
> shards from unassigned state to assigned state.
>
> How we can quickly move all the shards from unassigned state to assigned
> state?
>
> Is the recovery time dependent on data size?
>
> Regards,
> Ankit Jain
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [hidden email].
> For more options, visit https://groups.google.com/groups/opt_out.



--
Regards,
Abhijeet Rastogi (shadyabhi)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Restarting of node taking much time

Ankit Jain
Thanks Jorg and Abhijeet

@Abhijeet we are taking scenario of server crashes.
Also, the amount data server by each shard is around 100 GB.

Regards,
Ankit Jain


On Thursday, 10 October 2013 13:20:27 UTC+5:30, Abhijeet Rastogi wrote:
How are you restarting your nodes? Are you using via init scripts or
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-shutdown.html
? Using the API seems to do pretty fast recovery for me.

On Wed, Oct 9, 2013 at 5:36 PM, Ankit Jain <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="eWDQtBAO_S0J">ankitj...@...> wrote:

> Hi All,
>
> We have deployed 5 nodes cluster and each node is serving around 200 shards
> (total number of indices are 200 and each index has 200 shards).
>
> While restarting each node is taking around 20 to 30 minutes to move all
> shards from unassigned state to assigned state.
>
> How we can quickly move all the shards from unassigned state to assigned
> state?
>
> Is the recovery time dependent on data size?
>
> Regards,
> Ankit Jain
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="eWDQtBAO_S0J">elasticsearc...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.



--
Regards,
Abhijeet Rastogi (shadyabhi)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.