How is performance affected on distribution of data over multiple indices

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

How is performance affected on distribution of data over multiple indices

narinder.izap
Hi there,
              We are going to have billions of records in the es, so we have plan to distribute the data over multiple indices, I want to know how it is going to affect the performance, which of the options will be better in terms of performance,

# Single Index with data distributed in multiple types
OR 
# Multiple Indices

For example: 
Multiple Indices:

Index 1:
"user" all data related to user.

Index 2:
"media", all the media related to user, (photos, videos etc),  which are stored in "INDEX 1", related with some id in those documents.

Index 3:
"comments", all the comments related to users, and media those are stored in above two indices.

Single Index

"master_index"

type: user, media, comments

Reply | Threaded
Open this post in threaded view
|

Re: How is performance affected on distribution of data over multiple indices

Drew Raines
Narinder Kaur wrote:

> We are going to have billions of records in the es, so we have plan
> to distribute the data over multiple indices, I want to know how it
> is going to affect the performance, which of the options will be
> better in terms of performance,
>
> # Single Index with data distributed in multiple types
> OR 
> # Multiple Indices

I would go for single index with types.  That will give you more
flexibility in relating & retrieving the data.

There's no inherent limit to the size of an index.  It's just an
abstraction on top of one or more shards.  You'll want to tune the
number of shards based on how big your docs are, how many nodes,
etc.

-Drew
Reply | Threaded
Open this post in threaded view
|

Re: How is performance affected on distribution of data over multiple indices

Karussell
It also depends on how many times you'll update the index and how fast
updated docs should pop up in searches.

If you frequently update (a lot) data and you need them pop up in
(near) realtime, then you should prefer smaller indices.
Also it gives you the flexibility (but also more work) to split
indices not only on type but also on date (e.g. for comments etc).

Peter.

On 28 Okt., 20:52, Drew Raines <[hidden email]> wrote:

> Narinder Kaur wrote:
> > We are going to have billions of records in the es, so we have plan
> > to distribute the data over multiple indices, I want to know how it
> > is going to affect the performance, which of the options will be
> > better in terms of performance,
>
> > # Single Index with data distributed in multiple types
> > OR 
> > # Multiple Indices
>
> I would go for single index with types.  That will give you more
> flexibility in relating & retrieving the data.
>
> There's no inherent limit to the size of an index.  It's just an
> abstraction on top of one or more shards.  You'll want to tune the
> number of shards based on how big your docs are, how many nodes,
> etc.
>
> -Drew