How to create user indexes on the fly

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

How to create user indexes on the fly

Paul Loy
Hi,

I have finally got a use-case for per-user indexing. I was wondering what peoples' opinions are on the best way to check and then create an index and type mapping on-the-fly.

i.e.
  1. get data
  2. check if index exists and create if not
  3. check if mapping exists and create if not
  4. index data
How should I go about 2 and 3 in an optimal way.

Thanks in advance,

Paul

(PS I never know when to use indexes or indices)

--
---------------------------------------------
Paul Loy
[hidden email]
http://justgiving.com/thetrafalgarway - 300 miles, 2 bicycles, 36 hours
Reply | Threaded
Open this post in threaded view
|

Re: How to create user indexes on the fly

Clinton Gormley
Hiya

> I have finally got a use-case for per-user indexing. I was wondering
> what peoples' opinions are on the best way to check and then create an
> index and type mapping on-the-fly.
>
Don't forget that each shard is a Lucene instance, so if you have a
million users, you will need a LOT of boxes and memory to cope with
that.

> i.e.
>      1. get data
>      2. check if index exists and create if not
>      3. check if mapping exists and create if not
>      4. index data
> How should I go about 2 and 3 in an optimal way.
>
You can get a list of all known indices, but that may not be terribly
efficient.

It may be better to just try it and catch the error, eg:
 - create the index -> catch the error if it already exists
 - put the mapping -> should just work, if the mapping doesn't exist
   or hasn't changed

clint


Reply | Threaded
Open this post in threaded view
|

Re: How to create user indexes on the fly

Paul Loy
Thanks Clint. As always, answers tend to create more questions !

> I have finally got a use-case for per-user indexing. I was wondering
> what peoples' opinions are on the best way to check and then create an
> index and type mapping on-the-fly.
>
Don't forget that each shard is a Lucene instance, so if you have a
million users, you will need a LOT of boxes and memory to cope with
that.
 
Hmm... so are per user indexes not a good idea? We are expecting lots of users. Will ES definitely keep an instance running for each index even if that index has not been written to or read from for a while?

clint





--
---------------------------------------------
Paul Loy
[hidden email]
http://justgiving.com/thetrafalgarway - 300 miles, 2 bicycles, 36 hours

Reply | Threaded
Open this post in threaded view
|

Re: How to create user indexes on the fly

Clinton Gormley
Hi Paul

>         Don't forget that each shard is a Lucene instance, so if you
>         have a
>         million users, you will need a LOT of boxes and memory to cope
>         with
>         that.
>  
> Hmm... so are per user indexes not a good idea? We are expecting lots
> of users. Will ES definitely keep an instance running for each index
> even if that index has not been written to or read from for a while?
>
"not a good idea" depends on your application, really, but you say that
you will have lots of users, so:

My understanding is that yes, at least one primary shard must be alive
for each index. I say "at least one" although by default you would have
more than one. (you can specify this at index creation time)

Why not just store a user_id in each document that needs to be filtered
by user?  It will be way more efficient.

clint


Reply | Threaded
Open this post in threaded view
|

Re: How to create user indexes on the fly

ppearcy
Another thought (and apologies for confusing the conversation by
adding another Paul), if you need per user mapping is to have a single
index w/ as many shards as needed, and each user can be their own
document type, as mappings are defined per document type.

Regards,
Paul P

On Aug 11, 9:54 am, Clinton Gormley <[hidden email]> wrote:

> Hi Paul
>
> >         Don't forget that each shard is a Lucene instance, so if you
> >         have a
> >         million users, you will need a LOT of boxes and memory to cope
> >         with
> >         that.
>
> > Hmm... so are per user indexes not a good idea? We are expecting lots
> > of users. Will ES definitely keep an instance running for each index
> > even if that index has not been written to or read from for a while?
>
> "not a good idea" depends on your application, really, but you say that
> you will have lots of users, so:
>
> My understanding is that yes, at least one primary shard must be alive
> for each index. I say "at least one" although by default you would have
> more than one. (you can specify this at index creation time)
>
> Why not just store a user_id in each document that needs to be filtered
> by user?  It will be way more efficient.
>
> clint
Reply | Threaded
Open this post in threaded view
|

Re: How to create user indexes on the fly

Paul Loy
Ah, that's interesting. Any overheads doing this rather than the adding a userId suggestion?

I guess this will make searches quick as by using the type you're essencially pre-filtering.

Thanks Paul and Clint!

On Wed, Aug 11, 2010 at 5:05 PM, Paul <[hidden email]> wrote:
Another thought (and apologies for confusing the conversation by
adding another Paul), if you need per user mapping is to have a single
index w/ as many shards as needed, and each user can be their own
document type, as mappings are defined per document type.

Regards,
Paul P

On Aug 11, 9:54 am, Clinton Gormley <[hidden email]> wrote:
> Hi Paul
>
> >         Don't forget that each shard is a Lucene instance, so if you
> >         have a
> >         million users, you will need a LOT of boxes and memory to cope
> >         with
> >         that.
>
> > Hmm... so are per user indexes not a good idea? We are expecting lots
> > of users. Will ES definitely keep an instance running for each index
> > even if that index has not been written to or read from for a while?
>
> "not a good idea" depends on your application, really, but you say that
> you will have lots of users, so:
>
> My understanding is that yes, at least one primary shard must be alive
> for each index. I say "at least one" although by default you would have
> more than one. (you can specify this at index creation time)
>
> Why not just store a user_id in each document that needs to be filtered
> by user?  It will be way more efficient.
>
> clint



--
---------------------------------------------
Paul Loy
[hidden email]
http://justgiving.com/thetrafalgarway - 300 miles, 2 bicycles, 36 hours

Reply | Threaded
Open this post in threaded view
|

Re: How to create user indexes on the fly

kimchy
Administrator
Hi,

   Its basically the same, the multi type support within an index is done by using an _type field to each indexed document and automatically filtering by it when applicable.

-shay.banon

On Wed, Aug 11, 2010 at 7:15 PM, Paul Loy <[hidden email]> wrote:
Ah, that's interesting. Any overheads doing this rather than the adding a userId suggestion?

I guess this will make searches quick as by using the type you're essencially pre-filtering.

Thanks Paul and Clint!


On Wed, Aug 11, 2010 at 5:05 PM, Paul <[hidden email]> wrote:
Another thought (and apologies for confusing the conversation by
adding another Paul), if you need per user mapping is to have a single
index w/ as many shards as needed, and each user can be their own
document type, as mappings are defined per document type.

Regards,
Paul P

On Aug 11, 9:54 am, Clinton Gormley <[hidden email]> wrote:
> Hi Paul
>
> >         Don't forget that each shard is a Lucene instance, so if you
> >         have a
> >         million users, you will need a LOT of boxes and memory to cope
> >         with
> >         that.
>
> > Hmm... so are per user indexes not a good idea? We are expecting lots
> > of users. Will ES definitely keep an instance running for each index
> > even if that index has not been written to or read from for a while?
>
> "not a good idea" depends on your application, really, but you say that
> you will have lots of users, so:
>
> My understanding is that yes, at least one primary shard must be alive
> for each index. I say "at least one" although by default you would have
> more than one. (you can specify this at index creation time)
>
> Why not just store a user_id in each document that needs to be filtered
> by user?  It will be way more efficient.
>
> clint



--
---------------------------------------------
Paul Loy
[hidden email]
http://justgiving.com/thetrafalgarway - 300 miles, 2 bicycles, 36 hours