Index Scaling Question

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Index Scaling Question

dallasmahrt
We are evaluating ElasticSearch for our search solution and I had a
question about scaling indexes. Based on my research thus far, once an
index is created the shard count and replication factor are fixed. I
read in another post that if capacity is exceeded that a recommended
approach is to build a new index from the initial sources with higher
shard counts or higher replication factor (based on the type of
capacity being exceeded). The nature of our system makes this
difficult since we do not retain much of the data we are indexing and
cannot re-acquire that data easily.

My question is if there was any supported means of creating new index
from an existing index with new shard counts or replication factors?
If not, is there a plan to add this support? If so, how?

Thanks for any help you can offer.

--
Dallas Mahrt
Reply | Threaded
Open this post in threaded view
|

Re: Index Scaling Question

kimchy
Administrator
Hi,

   Good question. First, you can always create more shards than you expect to scale to. For example, a 20 shards with 1 replica cluster will scale up to 40 machines until it reaches its limit... . Bump that to 40 with one replica, and you can scale up to 80... .

   Second, by default, elasticsearch stores the source document as well. So you can always start a scrolled search and reindex the data. As discussed in another post, the reindex capability can be provided out of the box if the source is there.

   There is an option to add "resharding", but thats quite a complex task to accomplish with distributed systems (dynamo model take out of the picture since it does not really apply to search...). It can be implemented under certain restrictions, but its not something that I have on my plate for the near future.

   Last, the idea of multi index support in elasticsearch tries to address that. For example, if you index data based on time, you can create an index per month (for example). Each index has its own number of shards and number of replicas settings (though you might as well have it set to a fix value). This is not complete, of course, without the ability to search across multiple indices, which elasticsearch provides ;). This gives you the added benefit of doing faster searches on more recent months and then going backwards if you don't find anything relevant.

   The idea above does not have to be partitioned by month, of course, you can choose how you partition it. The benefit is that you can scale out endlessly in this scenario.

   Really last point ;), if you are concerned about scaling search, the ability to dynamically change the replica count is something that I plan to add (replicas sever search/count/get requests). You can actually do it currently by shutting down the cluster, changing in the gateway metadata the number of replicas, and starting it up again, but thats a hack ;).

cheers,
shay.banon

On Fri, May 7, 2010 at 12:19 AM, Dallas Mahrt <[hidden email]> wrote:
We are evaluating ElasticSearch for our search solution and I had a
question about scaling indexes. Based on my research thus far, once an
index is created the shard count and replication factor are fixed. I
read in another post that if capacity is exceeded that a recommended
approach is to build a new index from the initial sources with higher
shard counts or higher replication factor (based on the type of
capacity being exceeded). The nature of our system makes this
difficult since we do not retain much of the data we are indexing and
cannot re-acquire that data easily.

My question is if there was any supported means of creating new index
from an existing index with new shard counts or replication factors?
If not, is there a plan to add this support? If so, how?

Thanks for any help you can offer.

--
Dallas Mahrt