How to improve performance of facet queries?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

How to improve performance of facet queries?

Guillermo Arias del Río
Hi,

I'm trying to improve the performance of my facet queries. My typical facet query looks a bit more complicated (two facets with regex and a more complex query), but I have reduced it to a very simple example:

{
    "query": {
      "match": {
        "_tokens._all._text.ngram": "kat"
      }
    },
    "facets": {
        "tokens": {
            "terms": {
                "field": "_tokens._all._facet"
            }
        }
    },
    "size": 0
}

The answer is:

{
    "facets": {
        "tokens": {
            "_type": "terms",
            "missing": 0,
            "other": 7321391,
            "terms": [ ... ],
            "total": 7663578
        }
    },
    ...
}


As you can see, there are a lot of documents. My index has 23 GiB size.

This query takes ~ 200ms, but it takes < 5ms without the facet. The question is: how can I improve its performance? It should be around 10ms...

(1) I am thinking if rewriting my query could improve it. "_tokens._all._facet" is a non-analyzed string field, whereas "_tokens._all._text.ngram" is ICU tokenized, ICU folded and ngram analyzed. There are several values for each document. Is there anything wrong there I should consider?
(2) I don't know if it could be possible to "index" or "cache" somehow the values of "_tokens._all._facet". It is the field that is used by the facets all the time, so it gets constantly  accessed.
(3) If I use a cluster, could a high number of nodes and shards improve my performance? Would Elasticsearch perform a parallel facetting in the data nodes?
(4) Finally, can you give me advice on how to test if I am having problems with data access (like I/O blocks)?

Thanks in advance! :)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: How to improve performance of facet queries?

ppearcy
I am curious, is this 200ms the first time the query runs of 200ms everytime? The first one should always be slower and subsequent ones should be fast once the index is warmed. If you're trying to optimize the warming case, I recommend using index warmers.

Facets currently pull all the values for the field into memory (field cache): http://www.elasticsearch.org/guide/reference/index-modules/fielddata/

I would make sure you have enough RAM allocated to elasticsearch to hold all these in memory. 

To run through your items:
1) I don't think re-writing the query will help on the facet side. 
2) Yeah, this is how things work. Just ensure you are using resident cache type (see link above) so that your facet values stay in memory.
3) This could speed things up, since you are running across multiple nodes/shards, each with a smaller set of values in memory. Just watch out for incorrect facet counts (https://github.com/elasticsearch/elasticsearch/issues/1305)
4) I would ensure that you're machine isn't swapping and especially isn't swapping elasticsearch (check out details on mlockall here: http://www.elasticsearch.org/guide/reference/setup/installation/). If you're good there, you can check out iostat to ensure disk performance is as expected. 

Best Regards,
Paul




On Monday, September 9, 2013 7:16:22 AM UTC-4, Guillermo Arias del Río wrote:
Hi,

I'm trying to improve the performance of my facet queries. My typical facet query looks a bit more complicated (two facets with regex and a more complex query), but I have reduced it to a very simple example:

{
    "query": {
      "match": {
        "_tokens._all._text.ngram": "kat"
      }
    },
    "facets": {
        "tokens": {
            "terms": {
                "field": "_tokens._all._facet"
            }
        }
    },
    "size": 0
}

The answer is:

{
    "facets": {
        "tokens": {
            "_type": "terms",
            "missing": 0,
            "other": 7321391,
            "terms": [ ... ],
            "total": 7663578
        }
    },
    ...
}


As you can see, there are a lot of documents. My index has 23 GiB size.

This query takes ~ 200ms, but it takes < 5ms without the facet. The question is: how can I improve its performance? It should be around 10ms...

(1) I am thinking if rewriting my query could improve it. "_tokens._all._facet" is a non-analyzed string field, whereas "_tokens._all._text.ngram" is ICU tokenized, ICU folded and ngram analyzed. There are several values for each document. Is there anything wrong there I should consider?
(2) I don't know if it could be possible to "index" or "cache" somehow the values of "_tokens._all._facet". It is the field that is used by the facets all the time, so it gets constantly  accessed.
(3) If I use a cluster, could a high number of nodes and shards improve my performance? Would Elasticsearch perform a parallel facetting in the data nodes?
(4) Finally, can you give me advice on how to test if I am having problems with data access (like I/O blocks)?

Thanks in advance! :)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: How to improve performance of facet queries?

Guillermo Arias del Río
Hi, Paul,

thanks for your response :)

The time is an average of 5 requests after the first one, so it would be the warmed case, but I don't know if warmers could help me here: in the normal scenario, the queries vary a lot, though it is true that I am always facetting the same field. I also thought that my problem had to do with the RAM allocated... but then I ran bigdesk and saw that the memory never gets used above about 60% of the amount I've reserved (currently 6 GiB). I will try to tweak the field cache.

Thank you very much for the tips about (3) and (4). Maybe there's something going wrong there!

Best Regards,
Guillermo


On Mon, Sep 9, 2013 at 11:21 PM, ppearcy <[hidden email]> wrote:
I am curious, is this 200ms the first time the query runs of 200ms everytime? The first one should always be slower and subsequent ones should be fast once the index is warmed. If you're trying to optimize the warming case, I recommend using index warmers.

Facets currently pull all the values for the field into memory (field cache): http://www.elasticsearch.org/guide/reference/index-modules/fielddata/

I would make sure you have enough RAM allocated to elasticsearch to hold all these in memory. 

To run through your items:
1) I don't think re-writing the query will help on the facet side. 
2) Yeah, this is how things work. Just ensure you are using resident cache type (see link above) so that your facet values stay in memory.
3) This could speed things up, since you are running across multiple nodes/shards, each with a smaller set of values in memory. Just watch out for incorrect facet counts (https://github.com/elasticsearch/elasticsearch/issues/1305)
4) I would ensure that you're machine isn't swapping and especially isn't swapping elasticsearch (check out details on mlockall here: http://www.elasticsearch.org/guide/reference/setup/installation/). If you're good there, you can check out iostat to ensure disk performance is as expected. 

Best Regards,
Paul




On Monday, September 9, 2013 7:16:22 AM UTC-4, Guillermo Arias del Río wrote:
Hi,

I'm trying to improve the performance of my facet queries. My typical facet query looks a bit more complicated (two facets with regex and a more complex query), but I have reduced it to a very simple example:

{
    "query": {
      "match": {
        "_tokens._all._text.ngram": "kat"
      }
    },
    "facets": {
        "tokens": {
            "terms": {
                "field": "_tokens._all._facet"
            }
        }
    },
    "size": 0
}

The answer is:

{
    "facets": {
        "tokens": {
            "_type": "terms",
            "missing": 0,
            "other": 7321391,
            "terms": [ ... ],
            "total": 7663578
        }
    },
    ...
}


As you can see, there are a lot of documents. My index has 23 GiB size.

This query takes ~ 200ms, but it takes < 5ms without the facet. The question is: how can I improve its performance? It should be around 10ms...

(1) I am thinking if rewriting my query could improve it. "_tokens._all._facet" is a non-analyzed string field, whereas "_tokens._all._text.ngram" is ICU tokenized, ICU folded and ngram analyzed. There are several values for each document. Is there anything wrong there I should consider?
(2) I don't know if it could be possible to "index" or "cache" somehow the values of "_tokens._all._facet". It is the field that is used by the facets all the time, so it gets constantly  accessed.
(3) If I use a cluster, could a high number of nodes and shards improve my performance? Would Elasticsearch perform a parallel facetting in the data nodes?
(4) Finally, can you give me advice on how to test if I am having problems with data access (like I/O blocks)?

Thanks in advance! :)

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/UxRs4moIaPM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: How to improve performance of facet queries?

Guillermo Arias del Río
So, I checked a few things, but I am still lost...

Now I only have one node. I can see with bigdesk that the heap memory never goes over 5 of the available 6 GiB. "_nodes/stats/indices/fielddata/_tokens._all._facet" tells me that I have a fielddata cache of about 500 MiB for the field I'm using in my facet. Still, when I make the tests, I can see how Elasticsearch performs I/O operations (I used iotop for this). So, what can I be doing wrong? By the way, just to be sure, I took out the facets from the query and it lasts a few milliseconds, so it is really the facetting that's taking so long.

Can someone help me? Paul? :)


On Tue, Sep 10, 2013 at 12:04 PM, Guillermo Arias del Río <[hidden email]> wrote:
Hi, Paul,

thanks for your response :)

The time is an average of 5 requests after the first one, so it would be the warmed case, but I don't know if warmers could help me here: in the normal scenario, the queries vary a lot, though it is true that I am always facetting the same field. I also thought that my problem had to do with the RAM allocated... but then I ran bigdesk and saw that the memory never gets used above about 60% of the amount I've reserved (currently 6 GiB). I will try to tweak the field cache.

Thank you very much for the tips about (3) and (4). Maybe there's something going wrong there!

Best Regards,
Guillermo


On Mon, Sep 9, 2013 at 11:21 PM, ppearcy <[hidden email]> wrote:
I am curious, is this 200ms the first time the query runs of 200ms everytime? The first one should always be slower and subsequent ones should be fast once the index is warmed. If you're trying to optimize the warming case, I recommend using index warmers.

Facets currently pull all the values for the field into memory (field cache): http://www.elasticsearch.org/guide/reference/index-modules/fielddata/

I would make sure you have enough RAM allocated to elasticsearch to hold all these in memory. 

To run through your items:
1) I don't think re-writing the query will help on the facet side. 
2) Yeah, this is how things work. Just ensure you are using resident cache type (see link above) so that your facet values stay in memory.
3) This could speed things up, since you are running across multiple nodes/shards, each with a smaller set of values in memory. Just watch out for incorrect facet counts (https://github.com/elasticsearch/elasticsearch/issues/1305)
4) I would ensure that you're machine isn't swapping and especially isn't swapping elasticsearch (check out details on mlockall here: http://www.elasticsearch.org/guide/reference/setup/installation/). If you're good there, you can check out iostat to ensure disk performance is as expected. 

Best Regards,
Paul




On Monday, September 9, 2013 7:16:22 AM UTC-4, Guillermo Arias del Río wrote:
Hi,

I'm trying to improve the performance of my facet queries. My typical facet query looks a bit more complicated (two facets with regex and a more complex query), but I have reduced it to a very simple example:

{
    "query": {
      "match": {
        "_tokens._all._text.ngram": "kat"
      }
    },
    "facets": {
        "tokens": {
            "terms": {
                "field": "_tokens._all._facet"
            }
        }
    },
    "size": 0
}

The answer is:

{
    "facets": {
        "tokens": {
            "_type": "terms",
            "missing": 0,
            "other": 7321391,
            "terms": [ ... ],
            "total": 7663578
        }
    },
    ...
}


As you can see, there are a lot of documents. My index has 23 GiB size.

This query takes ~ 200ms, but it takes < 5ms without the facet. The question is: how can I improve its performance? It should be around 10ms...

(1) I am thinking if rewriting my query could improve it. "_tokens._all._facet" is a non-analyzed string field, whereas "_tokens._all._text.ngram" is ICU tokenized, ICU folded and ngram analyzed. There are several values for each document. Is there anything wrong there I should consider?
(2) I don't know if it could be possible to "index" or "cache" somehow the values of "_tokens._all._facet". It is the field that is used by the facets all the time, so it gets constantly  accessed.
(3) If I use a cluster, could a high number of nodes and shards improve my performance? Would Elasticsearch perform a parallel facetting in the data nodes?
(4) Finally, can you give me advice on how to test if I am having problems with data access (like I/O blocks)?

Thanks in advance! :)

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/UxRs4moIaPM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: How to improve performance of facet queries?

ppearcy
Hey,
  I played around with faceting and was able to get very similar results to yours. Regex vs normal term facet didn't seem to make a big difference.  I didn't do any formal testing, but running a few different types of facet queries came to the conclusion that facet run time is basically:
num result docs * num distinct terms for field

So, if you have a specific query or a low cardinality field things should be fast. Outside of that, I think you're looking at distributing this across more shards/CPUs to speed this up more. 

Best Regards,
Paul
 

On Tuesday, September 10, 2013 8:23:52 AM UTC-4, Guillermo Arias del Río wrote:
So, I checked a few things, but I am still lost...

Now I only have one node. I can see with bigdesk that the heap memory never goes over 5 of the available 6 GiB. "_nodes/stats/indices/fielddata/_tokens._all._facet" tells me that I have a fielddata cache of about 500 MiB for the field I'm using in my facet. Still, when I make the tests, I can see how Elasticsearch performs I/O operations (I used iotop for this). So, what can I be doing wrong? By the way, just to be sure, I took out the facets from the query and it lasts a few milliseconds, so it is really the facetting that's taking so long.

Can someone help me? Paul? :)


On Tue, Sep 10, 2013 at 12:04 PM, Guillermo Arias del Río <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="6TGypmAvtSEJ">arias...@...> wrote:
Hi, Paul,

thanks for your response :)

The time is an average of 5 requests after the first one, so it would be the warmed case, but I don't know if warmers could help me here: in the normal scenario, the queries vary a lot, though it is true that I am always facetting the same field. I also thought that my problem had to do with the RAM allocated... but then I ran bigdesk and saw that the memory never gets used above about 60% of the amount I've reserved (currently 6 GiB). I will try to tweak the field cache.

Thank you very much for the tips about (3) and (4). Maybe there's something going wrong there!

Best Regards,
Guillermo


On Mon, Sep 9, 2013 at 11:21 PM, ppearcy <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="6TGypmAvtSEJ">ppe...@...> wrote:
I am curious, is this 200ms the first time the query runs of 200ms everytime? The first one should always be slower and subsequent ones should be fast once the index is warmed. If you're trying to optimize the warming case, I recommend using index warmers.

Facets currently pull all the values for the field into memory (field cache): http://www.elasticsearch.org/guide/reference/index-modules/fielddata/

I would make sure you have enough RAM allocated to elasticsearch to hold all these in memory. 

To run through your items:
1) I don't think re-writing the query will help on the facet side. 
2) Yeah, this is how things work. Just ensure you are using resident cache type (see link above) so that your facet values stay in memory.
3) This could speed things up, since you are running across multiple nodes/shards, each with a smaller set of values in memory. Just watch out for incorrect facet counts (https://github.com/elasticsearch/elasticsearch/issues/1305)
4) I would ensure that you're machine isn't swapping and especially isn't swapping elasticsearch (check out details on mlockall here: http://www.elasticsearch.org/guide/reference/setup/installation/). If you're good there, you can check out iostat to ensure disk performance is as expected. 

Best Regards,
Paul




On Monday, September 9, 2013 7:16:22 AM UTC-4, Guillermo Arias del Río wrote:
Hi,

I'm trying to improve the performance of my facet queries. My typical facet query looks a bit more complicated (two facets with regex and a more complex query), but I have reduced it to a very simple example:

{
    "query": {
      "match": {
        "_tokens._all._text.ngram": "kat"
      }
    },
    "facets": {
        "tokens": {
            "terms": {
                "field": "_tokens._all._facet"
            }
        }
    },
    "size": 0
}

The answer is:

{
    "facets": {
        "tokens": {
            "_type": "terms",
            "missing": 0,
            "other": 7321391,
            "terms": [ ... ],
            "total": 7663578
        }
    },
    ...
}


As you can see, there are a lot of documents. My index has 23 GiB size.

This query takes ~ 200ms, but it takes < 5ms without the facet. The question is: how can I improve its performance? It should be around 10ms...

(1) I am thinking if rewriting my query could improve it. "_tokens._all._facet" is a non-analyzed string field, whereas "_tokens._all._text.ngram" is ICU tokenized, ICU folded and ngram analyzed. There are several values for each document. Is there anything wrong there I should consider?
(2) I don't know if it could be possible to "index" or "cache" somehow the values of "_tokens._all._facet". It is the field that is used by the facets all the time, so it gets constantly  accessed.
(3) If I use a cluster, could a high number of nodes and shards improve my performance? Would Elasticsearch perform a parallel facetting in the data nodes?
(4) Finally, can you give me advice on how to test if I am having problems with data access (like I/O blocks)?

Thanks in advance! :)

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/UxRs4moIaPM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="6TGypmAvtSEJ">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: How to improve performance of facet queries?

Guillermo Arias del Río
Hi, Paul,

I was doing the same kind of tests, but I found something very interesting: Even when I set my query to be limited to 0 ( "query": { "filtered": { "filter": { "limit": { "value": 0 } }, "query": {...} } } ), it still lasts over 200ms! But if I take out the facets, it just takes 2ms. I am going to test in a cluster soon. I hope I'll be able to find out what's happenning.

Thanks again.
Guillermo


On Wed, Sep 11, 2013 at 9:46 AM, ppearcy <[hidden email]> wrote:
Hey,
  I played around with faceting and was able to get very similar results to yours. Regex vs normal term facet didn't seem to make a big difference.  I didn't do any formal testing, but running a few different types of facet queries came to the conclusion that facet run time is basically:
num result docs * num distinct terms for field

So, if you have a specific query or a low cardinality field things should be fast. Outside of that, I think you're looking at distributing this across more shards/CPUs to speed this up more. 

Best Regards,
Paul
 

On Tuesday, September 10, 2013 8:23:52 AM UTC-4, Guillermo Arias del Río wrote:
So, I checked a few things, but I am still lost...

Now I only have one node. I can see with bigdesk that the heap memory never goes over 5 of the available 6 GiB. "_nodes/stats/indices/fielddata/_tokens._all._facet" tells me that I have a fielddata cache of about 500 MiB for the field I'm using in my facet. Still, when I make the tests, I can see how Elasticsearch performs I/O operations (I used iotop for this). So, what can I be doing wrong? By the way, just to be sure, I took out the facets from the query and it lasts a few milliseconds, so it is really the facetting that's taking so long.

Can someone help me? Paul? :)


On Tue, Sep 10, 2013 at 12:04 PM, Guillermo Arias del Río <[hidden email]> wrote:
Hi, Paul,

thanks for your response :)

The time is an average of 5 requests after the first one, so it would be the warmed case, but I don't know if warmers could help me here: in the normal scenario, the queries vary a lot, though it is true that I am always facetting the same field. I also thought that my problem had to do with the RAM allocated... but then I ran bigdesk and saw that the memory never gets used above about 60% of the amount I've reserved (currently 6 GiB). I will try to tweak the field cache.

Thank you very much for the tips about (3) and (4). Maybe there's something going wrong there!

Best Regards,
Guillermo


On Mon, Sep 9, 2013 at 11:21 PM, ppearcy <[hidden email]> wrote:
I am curious, is this 200ms the first time the query runs of 200ms everytime? The first one should always be slower and subsequent ones should be fast once the index is warmed. If you're trying to optimize the warming case, I recommend using index warmers.

Facets currently pull all the values for the field into memory (field cache): http://www.elasticsearch.org/guide/reference/index-modules/fielddata/

I would make sure you have enough RAM allocated to elasticsearch to hold all these in memory. 

To run through your items:
1) I don't think re-writing the query will help on the facet side. 
2) Yeah, this is how things work. Just ensure you are using resident cache type (see link above) so that your facet values stay in memory.
3) This could speed things up, since you are running across multiple nodes/shards, each with a smaller set of values in memory. Just watch out for incorrect facet counts (https://github.com/elasticsearch/elasticsearch/issues/1305)
4) I would ensure that you're machine isn't swapping and especially isn't swapping elasticsearch (check out details on mlockall here: http://www.elasticsearch.org/guide/reference/setup/installation/). If you're good there, you can check out iostat to ensure disk performance is as expected. 

Best Regards,
Paul




On Monday, September 9, 2013 7:16:22 AM UTC-4, Guillermo Arias del Río wrote:
Hi,

I'm trying to improve the performance of my facet queries. My typical facet query looks a bit more complicated (two facets with regex and a more complex query), but I have reduced it to a very simple example:

{
    "query": {
      "match": {
        "_tokens._all._text.ngram": "kat"
      }
    },
    "facets": {
        "tokens": {
            "terms": {
                "field": "_tokens._all._facet"
            }
        }
    },
    "size": 0
}

The answer is:

{
    "facets": {
        "tokens": {
            "_type": "terms",
            "missing": 0,
            "other": 7321391,
            "terms": [ ... ],
            "total": 7663578
        }
    },
    ...
}


As you can see, there are a lot of documents. My index has 23 GiB size.

This query takes ~ 200ms, but it takes < 5ms without the facet. The question is: how can I improve its performance? It should be around 10ms...

(1) I am thinking if rewriting my query could improve it. "_tokens._all._facet" is a non-analyzed string field, whereas "_tokens._all._text.ngram" is ICU tokenized, ICU folded and ngram analyzed. There are several values for each document. Is there anything wrong there I should consider?
(2) I don't know if it could be possible to "index" or "cache" somehow the values of "_tokens._all._facet". It is the field that is used by the facets all the time, so it gets constantly  accessed.
(3) If I use a cluster, could a high number of nodes and shards improve my performance? Would Elasticsearch perform a parallel facetting in the data nodes?
(4) Finally, can you give me advice on how to test if I am having problems with data access (like I/O blocks)?

Thanks in advance! :)

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/UxRs4moIaPM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.


--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/UxRs4moIaPM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: How to improve performance of facet queries?

Guillermo Arias del Río
Hi,

I've started a second node in the same server and the performance goes up... and I think I've discovered why. Elasticsearch is aware that my server has 8 CPU's, so it reserves a search thread pool of 8 threads. When I launch the search, only one of these threads is working. But if I start two nodes, two threads are working. That results in a better usage of the server's resources. So, having more CPU's helps dealing with more requests, but it doesn't increase the performance of one single query. Am I right?


On Wed, Sep 11, 2013 at 11:29 AM, Guillermo Arias del Río <[hidden email]> wrote:
Hi, Paul,

I was doing the same kind of tests, but I found something very interesting: Even when I set my query to be limited to 0 ( "query": { "filtered": { "filter": { "limit": { "value": 0 } }, "query": {...} } } ), it still lasts over 200ms! But if I take out the facets, it just takes 2ms. I am going to test in a cluster soon. I hope I'll be able to find out what's happenning.

Thanks again.
Guillermo


On Wed, Sep 11, 2013 at 9:46 AM, ppearcy <[hidden email]> wrote:
Hey,
  I played around with faceting and was able to get very similar results to yours. Regex vs normal term facet didn't seem to make a big difference.  I didn't do any formal testing, but running a few different types of facet queries came to the conclusion that facet run time is basically:
num result docs * num distinct terms for field

So, if you have a specific query or a low cardinality field things should be fast. Outside of that, I think you're looking at distributing this across more shards/CPUs to speed this up more. 

Best Regards,
Paul
 

On Tuesday, September 10, 2013 8:23:52 AM UTC-4, Guillermo Arias del Río wrote:
So, I checked a few things, but I am still lost...

Now I only have one node. I can see with bigdesk that the heap memory never goes over 5 of the available 6 GiB. "_nodes/stats/indices/fielddata/_tokens._all._facet" tells me that I have a fielddata cache of about 500 MiB for the field I'm using in my facet. Still, when I make the tests, I can see how Elasticsearch performs I/O operations (I used iotop for this). So, what can I be doing wrong? By the way, just to be sure, I took out the facets from the query and it lasts a few milliseconds, so it is really the facetting that's taking so long.

Can someone help me? Paul? :)


On Tue, Sep 10, 2013 at 12:04 PM, Guillermo Arias del Río <[hidden email]> wrote:
Hi, Paul,

thanks for your response :)

The time is an average of 5 requests after the first one, so it would be the warmed case, but I don't know if warmers could help me here: in the normal scenario, the queries vary a lot, though it is true that I am always facetting the same field. I also thought that my problem had to do with the RAM allocated... but then I ran bigdesk and saw that the memory never gets used above about 60% of the amount I've reserved (currently 6 GiB). I will try to tweak the field cache.

Thank you very much for the tips about (3) and (4). Maybe there's something going wrong there!

Best Regards,
Guillermo


On Mon, Sep 9, 2013 at 11:21 PM, ppearcy <[hidden email]> wrote:
I am curious, is this 200ms the first time the query runs of 200ms everytime? The first one should always be slower and subsequent ones should be fast once the index is warmed. If you're trying to optimize the warming case, I recommend using index warmers.

Facets currently pull all the values for the field into memory (field cache): http://www.elasticsearch.org/guide/reference/index-modules/fielddata/

I would make sure you have enough RAM allocated to elasticsearch to hold all these in memory. 

To run through your items:
1) I don't think re-writing the query will help on the facet side. 
2) Yeah, this is how things work. Just ensure you are using resident cache type (see link above) so that your facet values stay in memory.
3) This could speed things up, since you are running across multiple nodes/shards, each with a smaller set of values in memory. Just watch out for incorrect facet counts (https://github.com/elasticsearch/elasticsearch/issues/1305)
4) I would ensure that you're machine isn't swapping and especially isn't swapping elasticsearch (check out details on mlockall here: http://www.elasticsearch.org/guide/reference/setup/installation/). If you're good there, you can check out iostat to ensure disk performance is as expected. 

Best Regards,
Paul




On Monday, September 9, 2013 7:16:22 AM UTC-4, Guillermo Arias del Río wrote:
Hi,

I'm trying to improve the performance of my facet queries. My typical facet query looks a bit more complicated (two facets with regex and a more complex query), but I have reduced it to a very simple example:

{
    "query": {
      "match": {
        "_tokens._all._text.ngram": "kat"
      }
    },
    "facets": {
        "tokens": {
            "terms": {
                "field": "_tokens._all._facet"
            }
        }
    },
    "size": 0
}

The answer is:

{
    "facets": {
        "tokens": {
            "_type": "terms",
            "missing": 0,
            "other": 7321391,
            "terms": [ ... ],
            "total": 7663578
        }
    },
    ...
}


As you can see, there are a lot of documents. My index has 23 GiB size.

This query takes ~ 200ms, but it takes < 5ms without the facet. The question is: how can I improve its performance? It should be around 10ms...

(1) I am thinking if rewriting my query could improve it. "_tokens._all._facet" is a non-analyzed string field, whereas "_tokens._all._text.ngram" is ICU tokenized, ICU folded and ngram analyzed. There are several values for each document. Is there anything wrong there I should consider?
(2) I don't know if it could be possible to "index" or "cache" somehow the values of "_tokens._all._facet". It is the field that is used by the facets all the time, so it gets constantly  accessed.
(3) If I use a cluster, could a high number of nodes and shards improve my performance? Would Elasticsearch perform a parallel facetting in the data nodes?
(4) Finally, can you give me advice on how to test if I am having problems with data access (like I/O blocks)?

Thanks in advance! :)

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/UxRs4moIaPM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.


--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/UxRs4moIaPM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.