Adding 1mln+ aliases is really slow.

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Adding 1mln+ aliases is really slow.

Michał Zgliczyński
First of all, thank you for building ElasticSearch. It is a truly awesome product.

I am trying to use the "User data flow". For this, I create a single index and multiple aliases inside of it. In my use case, I have about 5mln aliases to add.

The alias structure roughly looks like this:
{
 'index' : 'index_name',
 'alias' : 'user_' + user_id,
  'filter' : {
   'term' : {
     'user' : user_id,
   },
 ),
 'routing' => 'r' + user_id,
}

I create a server with this setup: 
{
 "index_name" : {
   "settings" : {
     "index.number_of_replicas" : "1",
     "index.number_of_shards" : "100",
   }
 }
}


Adding aliases works reasonably well for up to about 100k aliases, but it slows down for later updates.

The following timings are shown after creating an index and then adding aliases. No other operations were performed  during that time on the cluster and index.
These are the times needed to send and add aliases in batches of 5000:
batch: 5000 - time: 2311ms
batch: 5000 - time: 4096ms
batch: 5000 - time: 6022ms
batch: 5000 - time: 8127ms
batch: 5000 - time: 10174ms
batch: 5000 - time: 11403ms
batch: 5000 - time: 13126ms
batch: 5000 - time: 14335ms
batch: 5000 - time: 16500ms
batch: 5000 - time: 20663ms
batch: 5000 - time: 23002ms
batch: 5000 - time: 24457ms
batch: 5000 - time: 26375ms
batch: 5000 - time: 28984ms
batch: 5000 - time: 30559ms
batch: 5000 - time: 32234ms
batch: 5000 - time: 35098ms
batch: 5000 - time: 38922ms
batch: 5000 - time: 41776ms
batch: 5000 - time: 53402ms
batch: 5000 - time: 58600ms
batch: 5000 - time: 65567ms
batch: 5000 - time: 79885ms
batch: 5000 - time: 89900ms
batch: 5000 - time: 89368ms
batch: 5000 - time: 104109ms

As you can see, it gradually slows down. Is this expected? Looks like the addition time grows linearly to the amount of aliases. Is that correct? Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d8a03b09-9ed2-49c7-9ca8-a2285478d933%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Adding 1mln+ aliases is really slow.

dadoonet
I have never seen that number of aliases. That means you have 5 million users?
Nice project ;-)

I guess here that the cluster state is getting so big that it takes more and more time to update it and copy it to all nodes.

BTW how many nodes you have for those 200 shards?

Do you see anything in logs?

Thinking it loud.
Wondering if creating some alias template could help here to minimize the cluster state size?
Something like what you exactly describe:
{
 'index' : 'index_name',
 'alias' : 'user_{user_id}',
  'filter' : {
   'term' : {
     'user' : '{user_id}',
   },
 ),
 'routing' => 'r{user_id}'
}

It looks somehow similar to what Luca just did with https://github.com/elasticsearch/elasticsearch/pull/5180

Someone else has an idea? 

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 6 mars 2014 à 22:15, Michał Zgliczyński <[hidden email]> a écrit :

First of all, thank you for building ElasticSearch. It is a truly awesome product.

I am trying to use the "User data flow". For this, I create a single index and multiple aliases inside of it. In my use case, I have about 5mln aliases to add.

The alias structure roughly looks like this:
{
 'index' : 'index_name',
 'alias' : 'user_' + user_id,
  'filter' : {
   'term' : {
     'user' : user_id,
   },
 ),
 'routing' => 'r' + user_id,
}

I create a server with this setup: 
{
 "index_name" : {
   "settings" : {
     "index.number_of_replicas" : "1",
     "index.number_of_shards" : "100",
   }
 }
}


Adding aliases works reasonably well for up to about 100k aliases, but it slows down for later updates.

The following timings are shown after creating an index and then adding aliases. No other operations were performed  during that time on the cluster and index.
These are the times needed to send and add aliases in batches of 5000:
batch: 5000 - time: 2311ms
batch: 5000 - time: 4096ms
batch: 5000 - time: 6022ms
batch: 5000 - time: 8127ms
batch: 5000 - time: 10174ms
batch: 5000 - time: 11403ms
batch: 5000 - time: 13126ms
batch: 5000 - time: 14335ms
batch: 5000 - time: 16500ms
batch: 5000 - time: 20663ms
batch: 5000 - time: 23002ms
batch: 5000 - time: 24457ms
batch: 5000 - time: 26375ms
batch: 5000 - time: 28984ms
batch: 5000 - time: 30559ms
batch: 5000 - time: 32234ms
batch: 5000 - time: 35098ms
batch: 5000 - time: 38922ms
batch: 5000 - time: 41776ms
batch: 5000 - time: 53402ms
batch: 5000 - time: 58600ms
batch: 5000 - time: 65567ms
batch: 5000 - time: 79885ms
batch: 5000 - time: 89900ms
batch: 5000 - time: 89368ms
batch: 5000 - time: 104109ms

As you can see, it gradually slows down. Is this expected? Looks like the addition time grows linearly to the amount of aliases. Is that correct? Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d8a03b09-9ed2-49c7-9ca8-a2285478d933%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2F1FB8EC-91D1-4764-A39A-48D79E615F11%40pilato.fr.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Adding 1mln+ aliases is really slow.

Michał Zgliczyński
Currently this is ran on 4 nodes, with the possibility of adding new nodes. I get nothing in the logs.
I don't completely understand this for my use case: https://github.com/elasticsearch/elasticsearch/pull/5180 .
Ideally what I would like, would be not to create so many aliases, but to have a template doing the work for me. The template could hold the data pertaining the filtering and routing. The template could be very simple, for a request:
host:9200/user_{id} => this would automatically match the template: "user_*" and use its options. Also, this would very much simplify my work later on. As the server is alive and a new user would appear, the template would automatically use the templates settings, instead of me checking if the alias exists and then adding the alias.

This would allow me to create 1 template instead of so many similar aliases. Or maybe this is already implemented?

Thanks!

W dniu czwartek, 6 marca 2014 13:49:29 UTC-8 użytkownik David Pilato napisał:
I have never seen that number of aliases. That means you have 5 million users?
Nice project ;-)

I guess here that the cluster state is getting so big that it takes more and more time to update it and copy it to all nodes.

BTW how many nodes you have for those 200 shards?

Do you see anything in logs?

Thinking it loud.
Wondering if creating some alias template could help here to minimize the cluster state size?
Something like what you exactly describe:
{
 'index' : 'index_name',
 'alias' : 'user_{user_id}',
  'filter' : {
   'term' : {
     'user' : '{user_id}',
   },
 ),
 'routing' => 'r{user_id}'
}

It looks somehow similar to what Luca just did with <a href="https://github.com/elasticsearch/elasticsearch/pull/5180" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fpull%2F5180\46sa\75D\46sntz\0751\46usg\75AFQjCNGvtKnGcnR41TSjoLhycHqzc_5QKw';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fpull%2F5180\46sa\75D\46sntz\0751\46usg\75AFQjCNGvtKnGcnR41TSjoLhycHqzc_5QKw';return true;">https://github.com/elasticsearch/elasticsearch/pull/5180

Someone else has an idea? 

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 6 mars 2014 à 22:15, Michał Zgliczyński <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="2nDkieAWEdUJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">mzg...@...> a écrit :

First of all, thank you for building ElasticSearch. It is a truly awesome product.

I am trying to use the "User data flow". For this, I create a single index and multiple aliases inside of it. In my use case, I have about 5mln aliases to add.

The alias structure roughly looks like this:
{
 'index' : 'index_name',
 'alias' : 'user_' + user_id,
  'filter' : {
   'term' : {
     'user' : user_id,
   },
 ),
 'routing' => 'r' + user_id,
}

I create a server with this setup: 
{
 "index_name" : {
   "settings" : {
     "index.number_of_replicas" : "1",
     "index.number_of_shards" : "100",
   }
 }
}


Adding aliases works reasonably well for up to about 100k aliases, but it slows down for later updates.

The following timings are shown after creating an index and then adding aliases. No other operations were performed  during that time on the cluster and index.
These are the times needed to send and add aliases in batches of 5000:
batch: 5000 - time: 2311ms
batch: 5000 - time: 4096ms
batch: 5000 - time: 6022ms
batch: 5000 - time: 8127ms
batch: 5000 - time: 10174ms
batch: 5000 - time: 11403ms
batch: 5000 - time: 13126ms
batch: 5000 - time: 14335ms
batch: 5000 - time: 16500ms
batch: 5000 - time: 20663ms
batch: 5000 - time: 23002ms
batch: 5000 - time: 24457ms
batch: 5000 - time: 26375ms
batch: 5000 - time: 28984ms
batch: 5000 - time: 30559ms
batch: 5000 - time: 32234ms
batch: 5000 - time: 35098ms
batch: 5000 - time: 38922ms
batch: 5000 - time: 41776ms
batch: 5000 - time: 53402ms
batch: 5000 - time: 58600ms
batch: 5000 - time: 65567ms
batch: 5000 - time: 79885ms
batch: 5000 - time: 89900ms
batch: 5000 - time: 89368ms
batch: 5000 - time: 104109ms

As you can see, it gradually slows down. Is this expected? Looks like the addition time grows linearly to the amount of aliases. Is that correct? Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="2nDkieAWEdUJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/d8a03b09-9ed2-49c7-9ca8-a2285478d933%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/d8a03b09-9ed2-49c7-9ca8-a2285478d933%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/d8a03b09-9ed2-49c7-9ca8-a2285478d933%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/d8a03b09-9ed2-49c7-9ca8-a2285478d933%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/groups/opt_out" target="_blank" onmousedown="this.href='https://groups.google.com/groups/opt_out';return true;" onclick="this.href='https://groups.google.com/groups/opt_out';return true;">https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/de1e8dce-a559-4c3a-98c1-e87a5eed46c9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Adding 1mln+ aliases is really slow.

dadoonet
This is exactly what I tried to describe with:

Something like what you exactly describe:
{
 'index' : 'index_name',
 'alias' : 'user_{user_id}',
  'filter' : {
   'term' : {
     'user' : '{user_id}',
   },
 ),
 'routing' => 'r{user_id}'
}

It does not exist yet (or I missed it) but I think it could be a nice feature request. 

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 7 mars 2014 à 00:11, Michał Zgliczyński <[hidden email]> a écrit :

Currently this is ran on 4 nodes, with the possibility of adding new nodes. I get nothing in the logs.
I don't completely understand this for my use case: https://github.com/elasticsearch/elasticsearch/pull/5180 .
Ideally what I would like, would be not to create so many aliases, but to have a template doing the work for me. The template could hold the data pertaining the filtering and routing. The template could be very simple, for a request:
host:9200/user_{id} => this would automatically match the template: "user_*" and use its options. Also, this would very much simplify my work later on. As the server is alive and a new user would appear, the template would automatically use the templates settings, instead of me checking if the alias exists and then adding the alias.

This would allow me to create 1 template instead of so many similar aliases. Or maybe this is already implemented?

Thanks!

W dniu czwartek, 6 marca 2014 13:49:29 UTC-8 użytkownik David Pilato napisał:
I have never seen that number of aliases. That means you have 5 million users?
Nice project ;-)

I guess here that the cluster state is getting so big that it takes more and more time to update it and copy it to all nodes.

BTW how many nodes you have for those 200 shards?

Do you see anything in logs?

Thinking it loud.
Wondering if creating some alias template could help here to minimize the cluster state size?
Something like what you exactly describe:
{
 'index' : 'index_name',
 'alias' : 'user_{user_id}',
  'filter' : {
   'term' : {
     'user' : '{user_id}',
   },
 ),
 'routing' => 'r{user_id}'
}

It looks somehow similar to what Luca just did with <a href="https://github.com/elasticsearch/elasticsearch/pull/5180" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fpull%2F5180\46sa\75D\46sntz\0751\46usg\75AFQjCNGvtKnGcnR41TSjoLhycHqzc_5QKw';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fpull%2F5180\46sa\75D\46sntz\0751\46usg\75AFQjCNGvtKnGcnR41TSjoLhycHqzc_5QKw';return true;">https://github.com/elasticsearch/elasticsearch/pull/5180

Someone else has an idea? 

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 6 mars 2014 à 22:15, Michał Zgliczyński <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="2nDkieAWEdUJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">mzg...@...> a écrit :

First of all, thank you for building ElasticSearch. It is a truly awesome product.

I am trying to use the "User data flow". For this, I create a single index and multiple aliases inside of it. In my use case, I have about 5mln aliases to add.

The alias structure roughly looks like this:
{
 'index' : 'index_name',
 'alias' : 'user_' + user_id,
  'filter' : {
   'term' : {
     'user' : user_id,
   },
 ),
 'routing' => 'r' + user_id,
}

I create a server with this setup: 
{
 "index_name" : {
   "settings" : {
     "index.number_of_replicas" : "1",
     "index.number_of_shards" : "100",
   }
 }
}


Adding aliases works reasonably well for up to about 100k aliases, but it slows down for later updates.

The following timings are shown after creating an index and then adding aliases. No other operations were performed  during that time on the cluster and index.
These are the times needed to send and add aliases in batches of 5000:
batch: 5000 - time: 2311ms
batch: 5000 - time: 4096ms
batch: 5000 - time: 6022ms
batch: 5000 - time: 8127ms
batch: 5000 - time: 10174ms
batch: 5000 - time: 11403ms
batch: 5000 - time: 13126ms
batch: 5000 - time: 14335ms
batch: 5000 - time: 16500ms
batch: 5000 - time: 20663ms
batch: 5000 - time: 23002ms
batch: 5000 - time: 24457ms
batch: 5000 - time: 26375ms
batch: 5000 - time: 28984ms
batch: 5000 - time: 30559ms
batch: 5000 - time: 32234ms
batch: 5000 - time: 35098ms
batch: 5000 - time: 38922ms
batch: 5000 - time: 41776ms
batch: 5000 - time: 53402ms
batch: 5000 - time: 58600ms
batch: 5000 - time: 65567ms
batch: 5000 - time: 79885ms
batch: 5000 - time: 89900ms
batch: 5000 - time: 89368ms
batch: 5000 - time: 104109ms

As you can see, it gradually slows down. Is this expected? Looks like the addition time grows linearly to the amount of aliases. Is that correct? Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="2nDkieAWEdUJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/d8a03b09-9ed2-49c7-9ca8-a2285478d933%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/d8a03b09-9ed2-49c7-9ca8-a2285478d933%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/d8a03b09-9ed2-49c7-9ca8-a2285478d933%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/d8a03b09-9ed2-49c7-9ca8-a2285478d933%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/groups/opt_out" target="_blank" onmousedown="this.href='https://groups.google.com/groups/opt_out';return true;" onclick="this.href='https://groups.google.com/groups/opt_out';return true;">https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/de1e8dce-a559-4c3a-98c1-e87a5eed46c9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/463C9E80-7D07-4B80-8F34-04335A3D3EFF%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.