increasing shards and then nodes

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

increasing shards and then nodes

Lee Parker
We have been using elastic search within our production environment for a few months now.  I have discovered that I didn't properly plan for the amount of data in the index and find myself in need of increasing the number of shards to improve performance.  I am also going to increase the number of nodes.  I know that you can't just increase the number of shards without reindexing the data.  The difficulty in reindexing the data for me is that our data currently resides in a highly normalized system and de-normalizing each item into the document that ES indexes can take a long time (2.5 weeks last time).  

In that case, here is my plan for increasing my shard count:
1. Spin up a new ES server but make sure it doesn't join the existing server as a node.
2. Pull the list of documents from my data and then get the documents from the existing ES server and put them on the new one.
3. Once all the data is in the new ES server, shutdown the older one wipe its data and then start it up as a new node to join the new ES server.

Is there a better way to keep things going and increase the shard count?  We are currently using the 5 shard default and all of the shards are around 12G.  I am thinking about setting it to 15 shards and 1 (possibly 2) replicas.  Does this make sense?  We are going to have a 3 node cluster by the end of the month and will likely increase that as we see fit throughout the year.

Lee
--

"It doesn't matter whether you are liberal or conservative, but it's dangerous to always think with exclamation points instead of question marks."
by Marty Beckerman

Reply | Threaded
Open this post in threaded view
|

Re: increasing shards and then nodes

Clinton Gormley
Hi Lee
>
> In that case, here is my plan for increasing my shard count:
> 1. Spin up a new ES server but make sure it doesn't join the existing
> server as a node.
> 2. Pull the list of documents from my data and then get the documents
> from the existing ES server and put them on the new one.
> 3. Once all the data is in the new ES server, shutdown the older one
> wipe its data and then start it up as a new node to join the new ES
> server.

Alternatively, instead of messing with a new server, you could:
 - create a new index 'new_index_timestamp' on the same ES server
 - index from 'my_index' to 'new_index_timestamp'
 - delete 'my_index'
 - create alias 'my_index' pointing to 'new_index_timestamp'

The benefit of switching to using aliases is that it will make it easier
to make changes to your index in the future.

clint




Reply | Threaded
Open this post in threaded view
|

Re: increasing shards and then nodes

kimchy
Administrator
The 5 shards default allow you to grow up to 5 nodes in terms of index capacity, and in terms of improving search capacity, you can simply dynamically add more replicas to the cluster. So, with a 5 shard 1 replica option, you will max out on 10 nodes (nothing will be left to be allocated to the 11th node if started).

Sounds like you are planning on growing from 1-2 nodes to 3, this is still well below the capacity provided by 5 shards and 1 replica. A 12gb index is not a big one.

-shay.banon

On Wednesday, January 19, 2011 at 7:39 PM, Clinton Gormley wrote:

Hi Lee

In that case, here is my plan for increasing my shard count:
1. Spin up a new ES server but make sure it doesn't join the existing
server as a node.
2. Pull the list of documents from my data and then get the documents
from the existing ES server and put them on the new one.
3. Once all the data is in the new ES server, shutdown the older one
wipe its data and then start it up as a new node to join the new ES
server.

Alternatively, instead of messing with a new server, you could:
- create a new index 'new_index_timestamp' on the same ES server
- index from 'my_index' to 'new_index_timestamp'
- delete 'my_index'
- create alias 'my_index' pointing to 'new_index_timestamp'

The benefit of switching to using aliases is that it will make it easier
to make changes to your index in the future.

clint

Reply | Threaded
Open this post in threaded view
|

Re: increasing shards and then nodes

Lee Parker
It was my understanding based on some other threads that performance will degrade once shards are greater than 10g.  If that is not true then I will just stick with the 5 shards and 1 replica setup.

Lee
--

"It doesn't matter whether you are liberal or conservative, but it's dangerous to always think with exclamation points instead of question marks."
by Marty Beckerman

On Wed, Jan 19, 2011 at 1:22 PM, Shay Banon <[hidden email]> wrote:
The 5 shards default allow you to grow up to 5 nodes in terms of index capacity, and in terms of improving search capacity, you can simply dynamically add more replicas to the cluster. So, with a 5 shard 1 replica option, you will max out on 10 nodes (nothing will be left to be allocated to the 11th node if started).

Sounds like you are planning on growing from 1-2 nodes to 3, this is still well below the capacity provided by 5 shards and 1 replica. A 12gb index is not a big one.

-shay.banon

On Wednesday, January 19, 2011 at 7:39 PM, Clinton Gormley wrote:

Hi Lee

In that case, here is my plan for increasing my shard count:
1. Spin up a new ES server but make sure it doesn't join the existing
server as a node.
2. Pull the list of documents from my data and then get the documents
from the existing ES server and put them on the new one.
3. Once all the data is in the new ES server, shutdown the older one
wipe its data and then start it up as a new node to join the new ES
server.

Alternatively, instead of messing with a new server, you could:
- create a new index 'new_index_timestamp' on the same ES server
- index from 'my_index' to 'new_index_timestamp'
- delete 'my_index'
- create alias 'my_index' pointing to 'new_index_timestamp'

The benefit of switching to using aliases is that it will make it easier
to make changes to your index in the future.

clint


Reply | Threaded
Open this post in threaded view
|

Re: increasing shards and then nodes

kimchy
Administrator
It really depends what you plan to do with it. They can certainly be bigger than 10gb, if you use things like facets or sorting, the values of the field per doc will be loaded to memory, which means that you might need more memory as the shard doc count grows, but that is linear growth, that can be easily derived from some capacity planning down on a smaller size.

-shay.banon

On Thursday, January 20, 2011 at 4:01 PM, Lee Parker wrote:

It was my understanding based on some other threads that performance will degrade once shards are greater than 10g.  If that is not true then I will just stick with the 5 shards and 1 replica setup.

Lee
--

"It doesn't matter whether you are liberal or conservative, but it's dangerous to always think with exclamation points instead of question marks."
by Marty Beckerman

On Wed, Jan 19, 2011 at 1:22 PM, Shay Banon <[hidden email]> wrote:
The 5 shards default allow you to grow up to 5 nodes in terms of index capacity, and in terms of improving search capacity, you can simply dynamically add more replicas to the cluster. So, with a 5 shard 1 replica option, you will max out on 10 nodes (nothing will be left to be allocated to the 11th node if started).

Sounds like you are planning on growing from 1-2 nodes to 3, this is still well below the capacity provided by 5 shards and 1 replica. A 12gb index is not a big one.

-shay.banon

On Wednesday, January 19, 2011 at 7:39 PM, Clinton Gormley wrote:

Hi Lee

In that case, here is my plan for increasing my shard count:
1. Spin up a new ES server but make sure it doesn't join the existing
server as a node.
2. Pull the list of documents from my data and then get the documents
from the existing ES server and put them on the new one.
3. Once all the data is in the new ES server, shutdown the older one
wipe its data and then start it up as a new node to join the new ES
server.

Alternatively, instead of messing with a new server, you could:
- create a new index 'new_index_timestamp' on the same ES server
- index from 'my_index' to 'new_index_timestamp'
- delete 'my_index'
- create alias 'my_index' pointing to 'new_index_timestamp'

The benefit of switching to using aliases is that it will make it easier
to make changes to your index in the future.

clint



Reply | Threaded
Open this post in threaded view
|

Re: increasing shards and then nodes

Lee Parker
We do regularly sort the results using a field we call date which contains a unix timestamp as an integer.  we are already experiencing slow results from a filtered and sorted query.  If we currently have about 60G of data, which we do filtered and sorted queries against regularly, should we plan to use a greater number of shards?

Lee
--

"It doesn't matter whether you are liberal or conservative, but it's dangerous to always think with exclamation points instead of question marks."
by Marty Beckerman

On Thu, Jan 20, 2011 at 10:18 AM, Shay Banon <[hidden email]> wrote:
It really depends what you plan to do with it. They can certainly be bigger than 10gb, if you use things like facets or sorting, the values of the field per doc will be loaded to memory, which means that you might need more memory as the shard doc count grows, but that is linear growth, that can be easily derived from some capacity planning down on a smaller size.

-shay.banon

On Thursday, January 20, 2011 at 4:01 PM, Lee Parker wrote:

It was my understanding based on some other threads that performance will degrade once shards are greater than 10g.  If that is not true then I will just stick with the 5 shards and 1 replica setup.

Lee
--

"It doesn't matter whether you are liberal or conservative, but it's dangerous to always think with exclamation points instead of question marks."
by Marty Beckerman

On Wed, Jan 19, 2011 at 1:22 PM, Shay Banon <[hidden email]> wrote:
The 5 shards default allow you to grow up to 5 nodes in terms of index capacity, and in terms of improving search capacity, you can simply dynamically add more replicas to the cluster. So, with a 5 shard 1 replica option, you will max out on 10 nodes (nothing will be left to be allocated to the 11th node if started).

Sounds like you are planning on growing from 1-2 nodes to 3, this is still well below the capacity provided by 5 shards and 1 replica. A 12gb index is not a big one.

-shay.banon

On Wednesday, January 19, 2011 at 7:39 PM, Clinton Gormley wrote:

Hi Lee

In that case, here is my plan for increasing my shard count:
1. Spin up a new ES server but make sure it doesn't join the existing
server as a node.
2. Pull the list of documents from my data and then get the documents
from the existing ES server and put them on the new one.
3. Once all the data is in the new ES server, shutdown the older one
wipe its data and then start it up as a new node to join the new ES
server.

Alternatively, instead of messing with a new server, you could:
- create a new index 'new_index_timestamp' on the same ES server
- index from 'my_index' to 'new_index_timestamp'
- delete 'my_index'
- create alias 'my_index' pointing to 'new_index_timestamp'

The benefit of switching to using aliases is that it will make it easier
to make changes to your index in the future.

clint




Reply | Threaded
Open this post in threaded view
|

Re: increasing shards and then nodes

kimchy
Administrator
As long as you plan to add more capacity when it comes to number of servers/nodes.

On Thursday, January 20, 2011 at 6:36 PM, Lee Parker wrote:

We do regularly sort the results using a field we call date which contains a unix timestamp as an integer.  we are already experiencing slow results from a filtered and sorted query.  If we currently have about 60G of data, which we do filtered and sorted queries against regularly, should we plan to use a greater number of shards?

Lee
--

"It doesn't matter whether you are liberal or conservative, but it's dangerous to always think with exclamation points instead of question marks."
by Marty Beckerman

On Thu, Jan 20, 2011 at 10:18 AM, Shay Banon <[hidden email]> wrote:
It really depends what you plan to do with it. They can certainly be bigger than 10gb, if you use things like facets or sorting, the values of the field per doc will be loaded to memory, which means that you might need more memory as the shard doc count grows, but that is linear growth, that can be easily derived from some capacity planning down on a smaller size.

-shay.banon

On Thursday, January 20, 2011 at 4:01 PM, Lee Parker wrote:

It was my understanding based on some other threads that performance will degrade once shards are greater than 10g.  If that is not true then I will just stick with the 5 shards and 1 replica setup.

Lee
--

"It doesn't matter whether you are liberal or conservative, but it's dangerous to always think with exclamation points instead of question marks."
by Marty Beckerman

On Wed, Jan 19, 2011 at 1:22 PM, Shay Banon <[hidden email]> wrote:
The 5 shards default allow you to grow up to 5 nodes in terms of index capacity, and in terms of improving search capacity, you can simply dynamically add more replicas to the cluster. So, with a 5 shard 1 replica option, you will max out on 10 nodes (nothing will be left to be allocated to the 11th node if started).

Sounds like you are planning on growing from 1-2 nodes to 3, this is still well below the capacity provided by 5 shards and 1 replica. A 12gb index is not a big one.

-shay.banon

On Wednesday, January 19, 2011 at 7:39 PM, Clinton Gormley wrote:

Hi Lee

In that case, here is my plan for increasing my shard count:
1. Spin up a new ES server but make sure it doesn't join the existing
server as a node.
2. Pull the list of documents from my data and then get the documents
from the existing ES server and put them on the new one.
3. Once all the data is in the new ES server, shutdown the older one
wipe its data and then start it up as a new node to join the new ES
server.

Alternatively, instead of messing with a new server, you could:
- create a new index 'new_index_timestamp' on the same ES server
- index from 'my_index' to 'new_index_timestamp'
- delete 'my_index'
- create alias 'my_index' pointing to 'new_index_timestamp'

The benefit of switching to using aliases is that it will make it easier
to make changes to your index in the future.

clint





Reply | Threaded
Open this post in threaded view
|

Re: increasing shards and then nodes

kimchy
Administrator
And make sure you have enough memory allocated to the nodes as well. You can check how much memory is being occupied by the field data cache by using the node stats api.

On Thursday, January 20, 2011 at 6:38 PM, Shay Banon wrote:

As long as you plan to add more capacity when it comes to number of servers/nodes.

On Thursday, January 20, 2011 at 6:36 PM, Lee Parker wrote:

We do regularly sort the results using a field we call date which contains a unix timestamp as an integer.  we are already experiencing slow results from a filtered and sorted query.  If we currently have about 60G of data, which we do filtered and sorted queries against regularly, should we plan to use a greater number of shards?

Lee
--

"It doesn't matter whether you are liberal or conservative, but it's dangerous to always think with exclamation points instead of question marks."
by Marty Beckerman

On Thu, Jan 20, 2011 at 10:18 AM, Shay Banon <[hidden email]> wrote:
It really depends what you plan to do with it. They can certainly be bigger than 10gb, if you use things like facets or sorting, the values of the field per doc will be loaded to memory, which means that you might need more memory as the shard doc count grows, but that is linear growth, that can be easily derived from some capacity planning down on a smaller size.

-shay.banon

On Thursday, January 20, 2011 at 4:01 PM, Lee Parker wrote:

It was my understanding based on some other threads that performance will degrade once shards are greater than 10g.  If that is not true then I will just stick with the 5 shards and 1 replica setup.

Lee
--

"It doesn't matter whether you are liberal or conservative, but it's dangerous to always think with exclamation points instead of question marks."
by Marty Beckerman

On Wed, Jan 19, 2011 at 1:22 PM, Shay Banon <[hidden email]> wrote:
The 5 shards default allow you to grow up to 5 nodes in terms of index capacity, and in terms of improving search capacity, you can simply dynamically add more replicas to the cluster. So, with a 5 shard 1 replica option, you will max out on 10 nodes (nothing will be left to be allocated to the 11th node if started).

Sounds like you are planning on growing from 1-2 nodes to 3, this is still well below the capacity provided by 5 shards and 1 replica. A 12gb index is not a big one.

-shay.banon

On Wednesday, January 19, 2011 at 7:39 PM, Clinton Gormley wrote:

Hi Lee

In that case, here is my plan for increasing my shard count:
1. Spin up a new ES server but make sure it doesn't join the existing
server as a node.
2. Pull the list of documents from my data and then get the documents
from the existing ES server and put them on the new one.
3. Once all the data is in the new ES server, shutdown the older one
wipe its data and then start it up as a new node to join the new ES
server.

Alternatively, instead of messing with a new server, you could:
- create a new index 'new_index_timestamp' on the same ES server
- index from 'my_index' to 'new_index_timestamp'
- delete 'my_index'
- create alias 'my_index' pointing to 'new_index_timestamp'

The benefit of switching to using aliases is that it will make it easier
to make changes to your index in the future.

clint






Reply | Threaded
Open this post in threaded view
|

Re: increasing shards and then nodes

Lee Parker
The current node has 1.2g available to the heap and shows that it is using 1g of that.  I will be building out to at least three nodes in the next few weeks.  Is 1.2g of heap enough when we have shards of this size or larger?

Lee
--

"It doesn't matter whether you are liberal or conservative, but it's dangerous to always think with exclamation points instead of question marks."
by Marty Beckerman

On Thu, Jan 20, 2011 at 10:39 AM, Shay Banon <[hidden email]> wrote:
And make sure you have enough memory allocated to the nodes as well. You can check how much memory is being occupied by the field data cache by using the node stats api.

On Thursday, January 20, 2011 at 6:38 PM, Shay Banon wrote:

As long as you plan to add more capacity when it comes to number of servers/nodes.

On Thursday, January 20, 2011 at 6:36 PM, Lee Parker wrote:

We do regularly sort the results using a field we call date which contains a unix timestamp as an integer.  we are already experiencing slow results from a filtered and sorted query.  If we currently have about 60G of data, which we do filtered and sorted queries against regularly, should we plan to use a greater number of shards?

Lee
--

"It doesn't matter whether you are liberal or conservative, but it's dangerous to always think with exclamation points instead of question marks."
by Marty Beckerman

On Thu, Jan 20, 2011 at 10:18 AM, Shay Banon <[hidden email]> wrote:
It really depends what you plan to do with it. They can certainly be bigger than 10gb, if you use things like facets or sorting, the values of the field per doc will be loaded to memory, which means that you might need more memory as the shard doc count grows, but that is linear growth, that can be easily derived from some capacity planning down on a smaller size.

-shay.banon

On Thursday, January 20, 2011 at 4:01 PM, Lee Parker wrote:

It was my understanding based on some other threads that performance will degrade once shards are greater than 10g.  If that is not true then I will just stick with the 5 shards and 1 replica setup.

Lee
--

"It doesn't matter whether you are liberal or conservative, but it's dangerous to always think with exclamation points instead of question marks."
by Marty Beckerman

On Wed, Jan 19, 2011 at 1:22 PM, Shay Banon <[hidden email]> wrote:
The 5 shards default allow you to grow up to 5 nodes in terms of index capacity, and in terms of improving search capacity, you can simply dynamically add more replicas to the cluster. So, with a 5 shard 1 replica option, you will max out on 10 nodes (nothing will be left to be allocated to the 11th node if started).

Sounds like you are planning on growing from 1-2 nodes to 3, this is still well below the capacity provided by 5 shards and 1 replica. A 12gb index is not a big one.

-shay.banon

On Wednesday, January 19, 2011 at 7:39 PM, Clinton Gormley wrote:

Hi Lee

In that case, here is my plan for increasing my shard count:
1. Spin up a new ES server but make sure it doesn't join the existing
server as a node.
2. Pull the list of documents from my data and then get the documents
from the existing ES server and put them on the new one.
3. Once all the data is in the new ES server, shutdown the older one
wipe its data and then start it up as a new node to join the new ES
server.

Alternatively, instead of messing with a new server, you could:
- create a new index 'new_index_timestamp' on the same ES server
- index from 'my_index' to 'new_index_timestamp'
- delete 'my_index'
- create alias 'my_index' pointing to 'new_index_timestamp'

The benefit of switching to using aliases is that it will make it easier
to make changes to your index in the future.

clint







Reply | Threaded
Open this post in threaded view
|

Re: increasing shards and then nodes

Karussell
In reply to this post by Lee Parker
is creating a new index an option?

querying several indices at once is easy ...

On 19 Jan., 18:29, Lee Parker <[hidden email]> wrote:

> We have been using elastic search within our production environment for a
> few months now.  I have discovered that I didn't properly plan for the
> amount of data in the index and find myself in need of increasing the number
> of shards to improve performance.  I am also going to increase the number of
> nodes.  I know that you can't just increase the number of shards without
> reindexing the data.  The difficulty in reindexing the data for me is that
> our data currently resides in a highly normalized system and de-normalizing
> each item into the document that ES indexes can take a long time (2.5 weeks
> last time).
>
> In that case, here is my plan for increasing my shard count:
> 1. Spin up a new ES server but make sure it doesn't join the existing server
> as a node.
> 2. Pull the list of documents from my data and then get the documents from
> the existing ES server and put them on the new one.
> 3. Once all the data is in the new ES server, shutdown the older one wipe
> its data and then start it up as a new node to join the new ES server.
>
> Is there a better way to keep things going and increase the shard count?  We
> are currently using the 5 shard default and all of the shards are around
> 12G.  I am thinking about setting it to 15 shards and 1 (possibly 2)
> replicas.  Does this make sense?  We are going to have a 3 node cluster by
> the end of the month and will likely increase that as we see fit throughout
> the year.
>
> Lee
> --
>
> "It doesn't matter whether you are liberal or conservative, but it's
> dangerous to always think with exclamation points instead of question
> marks."
> by Marty Beckerman
Reply | Threaded
Open this post in threaded view
|

Re: increasing shards and then nodes

Karussell
ups, didn't saw shay's answers. so Lee: forget about my ;)

On 20 Jan., 18:40, Karussell <[hidden email]> wrote:

> is creating a new index an option?
>
> querying several indices at once is easy ...
>
> On 19 Jan., 18:29, Lee Parker <[hidden email]> wrote:
>
> > We have been using elastic search within our production environment for a
> > few months now.  I have discovered that I didn't properly plan for the
> > amount of data in the index and find myself in need of increasing the number
> > of shards to improve performance.  I am also going to increase the number of
> > nodes.  I know that you can't just increase the number of shards without
> > reindexing the data.  The difficulty in reindexing the data for me is that
> > our data currently resides in a highly normalized system and de-normalizing
> > each item into the document that ES indexes can take a long time (2.5 weeks
> > last time).
>
> > In that case, here is my plan for increasing my shard count:
> > 1. Spin up a new ES server but make sure it doesn't join the existing server
> > as a node.
> > 2. Pull the list of documents from my data and then get the documents from
> > the existing ES server and put them on the new one.
> > 3. Once all the data is in the new ES server, shutdown the older one wipe
> > its data and then start it up as a new node to join the new ES server.
>
> > Is there a better way to keep things going and increase the shard count?  We
> > are currently using the 5 shard default and all of the shards are around
> > 12G.  I am thinking about setting it to 15 shards and 1 (possibly 2)
> > replicas.  Does this make sense?  We are going to have a 3 node cluster by
> > the end of the month and will likely increase that as we see fit throughout
> > the year.
>
> > Lee
> > --
>
> > "It doesn't matter whether you are liberal or conservative, but it's
> > dangerous to always think with exclamation points instead of question
> > marks."
> > by Marty Beckerman
>
>
Reply | Threaded
Open this post in threaded view
|

Re: increasing shards and then nodes

Clinton Gormley
In reply to this post by Lee Parker
On Thu, 2011-01-20 at 11:26 -0600, Lee Parker wrote:
> The current node has 1.2g available to the heap and shows that it is
> using 1g of that.  I will be building out to at least three nodes in
> the next few weeks.  Is 1.2g of heap enough when we have shards of
> this size or larger?

I would say no. In fact, I'm really surprised you haven't had out of
memory issues with that amount of data and so little memory.

We have 14GB of data, divided into 5 shards, one replica, and we have
two nodes, each with 14GB of heap assigned to elasticsearch.

The more data that can live in memory, the faster ES will perform.

clint