index per company - any alternatives?

classic Classic list List threaded Threaded
5 messages Options
MoD
Reply | Threaded
Open this post in threaded view
|

index per company - any alternatives?

MoD
Hi,

We are running a saas crm service. We set up elasticsearch to create an index per company (for example abc-company has its own index, xyz-company has its own index.).

But after 1000+ company we are suspecting this may not be a correct setup.

Especially when elasticsearch restarts (due to a failure) it starts with recovery and 1000+ index with 5 shards recovery takes forever (and %100 cpu).

Any ideas for a correct setup?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: index per company - any alternatives?

joergprante@gmail.com
You can over-allocate shards.

https://groups.google.com/forum/#!msg/elasticsearch/49q-_AgQCp8/MRol0t9asEcJ

Here are the docs for indices aliases with routing:
http://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html

Jörg

Am 05.03.13 13:39, schrieb MoD:

> Hi,
>
> We are running a saas crm service. We set up elasticsearch to create
> an index per company (for example abc-company has its own index,
> xyz-company has its own index.).
>
> But after 1000+ company we are suspecting this may not be a correct
> setup.
>
> Especially when elasticsearch restarts (due to a failure) it starts
> with recovery and 1000+ index with 5 shards recovery takes forever
> (and %100 cpu).
>
> Any ideas for a correct setup?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


Reply | Threaded
Open this post in threaded view
|

Re: index per company - any alternatives?

Michael Sick
Can you explain more about the nature of the data? If it's not time based, as Joerg suggests using a single index with sharding and routing could be the answer.

On Tue, Mar 5, 2013 at 7:49 AM, Jörg Prante <[hidden email]> wrote:
You can over-allocate shards.

https://groups.google.com/forum/#!msg/elasticsearch/49q-_AgQCp8/MRol0t9asEcJ

Here are the docs for indices aliases with routing: http://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html

Jörg

Am 05.03.13 13:39, schrieb MoD:

Hi,

We are running a saas crm service. We set up elasticsearch to create an index per company (for example abc-company has its own index, xyz-company has its own index.).

But after 1000+ company we are suspecting this may not be a correct setup.

Especially when elasticsearch restarts (due to a failure) it starts with recovery and 1000+ index with 5 shards recovery takes forever (and %100 cpu).

Any ideas for a correct setup?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.



--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
MoD
Reply | Threaded
Open this post in threaded view
|

Re: index per company - any alternatives?

MoD
The data is the contacts/companies/notes of each company (domain, user). We are using elasticsearch to index the data and full text search the data on the site.

It is not timebased and will remain searchable as long as the company wishes to use the product.

At first we thought that in order to limit the search within the company we should use index per company.

But after the company/user base grew, we found out that recovery of indexes takes too long. By the way, that is the sole reason (the recovery phase) we wish to change the setup.



On Tuesday, 5 March 2013 15:18:00 UTC+2, Michael Sick wrote:
Can you explain more about the nature of the data? If it's not time based, as Joerg suggests using a single index with sharding and routing could be the answer.

On Tue, Mar 5, 2013 at 7:49 AM, Jörg Prante <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="hG_AoXgWAYMJ">joerg...@...> wrote:
You can over-allocate shards.

https://groups.google.com/forum/#!msg/elasticsearch/49q-_AgQCp8/MRol0t9asEcJ

Here are the docs for indices aliases with routing: http://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html

Jörg

Am 05.03.13 13:39, schrieb MoD:

Hi,

We are running a saas crm service. We set up elasticsearch to create an index per company (for example abc-company has its own index, xyz-company has its own index.).

But after 1000+ company we are suspecting this may not be a correct setup.

Especially when elasticsearch restarts (due to a failure) it starts with recovery and 1000+ index with 5 shards recovery takes forever (and %100 cpu).

Any ideas for a correct setup?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="hG_AoXgWAYMJ">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: index per company - any alternatives?

ppearcy
Are you able to figure out what is eating up the most time during recovery? If you set index.gateway to DEBUG log level you should be able to get those details. 

One alternative solution is to tweak the index.translog.flush_threshold in the config file. I deal with a decent number of indexes (less than you do, though) and moving this from the default of 5000 down to 1000 helped our recovery times. It's a tradeoff, since you will have more merges, but if your indexing volume is small won't make a difference. 

This should only require a cluster restart instead of a full data rebuild. 

That being said, one index with routing key on company will definitely help. 

Best Regards,
Paul

On Tuesday, March 5, 2013 11:20:02 AM UTC-7, MoD wrote:
The data is the contacts/companies/notes of each company (domain, user). We are using elasticsearch to index the data and full text search the data on the site.

It is not timebased and will remain searchable as long as the company wishes to use the product.

At first we thought that in order to limit the search within the company we should use index per company.

But after the company/user base grew, we found out that recovery of indexes takes too long. By the way, that is the sole reason (the recovery phase) we wish to change the setup.



On Tuesday, 5 March 2013 15:18:00 UTC+2, Michael Sick wrote:
Can you explain more about the nature of the data? If it's not time based, as Joerg suggests using a single index with sharding and routing could be the answer.

On Tue, Mar 5, 2013 at 7:49 AM, Jörg Prante <[hidden email]> wrote:
You can over-allocate shards.

https://groups.google.com/forum/#!msg/elasticsearch/49q-_AgQCp8/MRol0t9asEcJ

Here are the docs for indices aliases with routing: http://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html

Jörg

Am 05.03.13 13:39, schrieb MoD:

Hi,

We are running a saas crm service. We set up elasticsearch to create an index per company (for example abc-company has its own index, xyz-company has its own index.).

But after 1000+ company we are suspecting this may not be a correct setup.

Especially when elasticsearch restarts (due to a failure) it starts with recovery and 1000+ index with 5 shards recovery takes forever (and %100 cpu).

Any ideas for a correct setup?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.