How many indices in 0.13?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

How many indices in 0.13?

John Chang
In the 0.13 release notes you write there is "Improved Support for Large Number of Indices."  How can we get a sense of how many indices can be wisely supported, and when we've gone too far?  I'm guessing it would be important to consider how many indices you can leave open at once, and how long it takes to open one up.

We index documents for users; they only need to be able to search against their own data.  We've always included a user identifier in the documents and put them all in the same index.  But with the 0.13 announcement we are wondering if we can have one index per user and speed up search performance.

Or, would a better way to accomplish the same search performance improvement be to use the routing based on custom value (ie. user identifier) discussed in the "Improved Support for Large Number of Indices" section?

Thanks and congrats on the 0.13 release!
Reply | Threaded
Open this post in threaded view
|

Re: How many indices in 0.13?

kimchy
Administrator

Hey,

   Let me try and explain the improvement a bit, and then we can have a better picture. In elasticsearch, there are two main "states" that are stored, one is the cluster state, and one is the state each node has.

    The cluster state includes all the indices metadata (settings, mappings sources - json form) and routing information.

   The node state includes index level state (parsed mappings tree structure for fast Lucene Document creation), and shard level state (the Lucene constructs to index and search data, since each shard is a Lucene index).

   Pre 0.13, the index level node state was created on every node (including client nodes). There has been a big refactoring in 0.13 to have the ability to index and search just using the cluster state without needing the node level index data, so now, index level node structures are created lazily on each node only when a shard needs to be allocated on it.

   What does it mean? It means that with a big enough cluster, nodes that end up not being allocated a shard of a specific index will not incur the overhead of that index structures created.

    What does it mean for many indices? It means that for cases where you get big distribution of indices, which end up not allocating at all shards for a specific index, the overhead will be much smaller.

    It does still mean that a shard is a Lucene index, and if you have 2-3 nodes with many indices, you will still have the same possible problems (just creating many Lucene indices on the same node).



    Regarding the routing, then yes, they can really help at solving the problem. Use the user name as the routing value, and when searching, provide the user name as the routing value (you will still need to filter by it), and it will end up hitting a single shard for search, and not all shards.

cheers,
-shay.banon

On Saturday, November 20, 2010 at 6:41 AM, John Chang wrote:


In the 0.13 release notes you write there is "Improved Support for Large
Number of Indices." How can we get a sense of how many indices can be
wisely supported, and when we've gone too far? I'm guessing it would be
important to consider how many indices you can leave open at once, and how
long it takes to open one up.

We index documents for users; they only need to be able to search against
their own data. We've always included a user identifier in the documents
and put them all in the same index. But with the 0.13 announcement we are
wondering if we can have one index per user and speed up search performance.

Or, would a better way to accomplish the same search performance improvement
be to use the routing based on custom value (ie. user identifier) discussed
in the "Improved Support for Large Number of Indices" section?

Thanks and congrats on the 0.13 release!
--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/How-many-indices-in-0-13-tp1934624p1934624.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.