Can the order of filters impact performance?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Can the order of filters impact performance?

egaumer
I've got simple term query with two filters. One filter is a date range filter and the other is a terms filter for handling ACLs. The ACL filter, on average, has about 1000 terms so it's heavy. I suspect that running the date range filter first can significantly restrict the document set so that the ACL filter runs more efficiently (given the smaller subset). 

I don't believe Lucene does any query optimization to handle this. Is there any way to guarantee the order of filters? Are my assumptions correct that the order of fliters can impact performance?

Thanks,
-Eric

Reply | Threaded
Open this post in threaded view
|

Re: Can the order of filters impact performance?

tbrianjones
I would also really like to know the answer to this question.

Additionally, I'm wondering if the query can be run before the filters or vice versa?  Will this impact performance?  Does Elasticsearch have built in logic to optimize queries independent of their order in a data request?  If we can control the order in which pieces of a query / filter are executed and they do impact performance, then please give an implementation example.

On Monday, July 9, 2012 9:11:33 AM UTC-7, egaumer wrote:
I've got simple term query with two filters. One filter is a date range filter and the other is a terms filter for handling ACLs. The ACL filter, on average, has about 1000 terms so it's heavy. I suspect that running the date range filter first can significantly restrict the document set so that the ACL filter runs more efficiently (given the smaller subset). 

I don't believe Lucene does any query optimization to handle this. Is there any way to guarantee the order of filters? Are my assumptions correct that the order of fliters can impact performance?

Thanks,
-Eric

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Can the order of filters impact performance?

Clinton Gormley-2
On Thu, 2013-02-07 at 21:47 -0800, Brian Jones wrote:
> I would also really like to know the answer to this question.
>
>
> Additionally, I'm wondering if the query can be run before the filters
> or vice versa?  Will this impact performance?  Does Elasticsearch have
> built in logic to optimize queries independent of their order in a
> data request?  If we can control the order in which pieces of a
> query / filter are executed and they do impact performance, then
> please give an implementation example.

Filters are executed in the order they are passed in to an and/or or
must/should clause. must clauses are executed before should clauses
(this goes for filters and queries)

also, in the next version of ES, "cheap" (ie bitset) "should" filter
clauses are executed before the more expensive filter clauses (eg
geo-distance).

in a filtered query, i believe the filter and query are executed
together, ie filter->query->filter->query etc and in the next version,
you'll be able to control the order of execution.

in the search API, "filter" is executed after "query", (and after
facets).




--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


Reply | Threaded
Open this post in threaded view
|

Re: Can the order of filters impact performance?

kimchy
Administrator
One more thing to add to clint answer, terms filter and date range internally always execute the "full" filter and end up being represented as a bitset. Even with 100 terms in a terms filter, this should be fast, and filter caching, specifically for ACL type logic, is nicely cached. One thing that I would add, is use _cache_key for the ACL filter, so the big list of terms won't be used as the relevant filter cache key, the _cache_key can be something like _user_id_1122_acl.

On Feb 8, 2013, at 11:19 AM, Clinton Gormley <[hidden email]> wrote:

> On Thu, 2013-02-07 at 21:47 -0800, Brian Jones wrote:
>> I would also really like to know the answer to this question.
>>
>>
>> Additionally, I'm wondering if the query can be run before the filters
>> or vice versa?  Will this impact performance?  Does Elasticsearch have
>> built in logic to optimize queries independent of their order in a
>> data request?  If we can control the order in which pieces of a
>> query / filter are executed and they do impact performance, then
>> please give an implementation example.
>
> Filters are executed in the order they are passed in to an and/or or
> must/should clause. must clauses are executed before should clauses
> (this goes for filters and queries)
>
> also, in the next version of ES, "cheap" (ie bitset) "should" filter
> clauses are executed before the more expensive filter clauses (eg
> geo-distance).
>
> in a filtered query, i believe the filter and query are executed
> together, ie filter->query->filter->query etc and in the next version,
> you'll be able to control the order of execution.
>
> in the search API, "filter" is executed after "query", (and after
> facets).
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


Reply | Threaded
Open this post in threaded view
|

Re: Can the order of filters impact performance?

egaumer
We did a fair amount of performance testing around permission filters and the bitsets (i.e., terms filter) perform really well. We expected issues given some past experience with a proprietary search product but this wasn't the case with elasticsearch and we used filters ranging from 8K to 10K unique permissions. 

I roughly recall the first query averaging around 150ms with subsequent (cached filters) queries averaging ~30ms (250K unique queries - 2 node cluster w/ 1 shard (~300GB) 1 replica - SSD - 128GB RAM - 8GB heap - 10GigE - no faceting and no sorting - basic boolean searches). We were mainly interested in how these large terms filters effected performance.

Queries looked roughly like the following... 

    {
        "query": {
            "filtered": {
                "query": {
                    "bool": {
                        "should": [
                            {"field": {"headline": "Growth"}},
                            {"field": {"text": "Growth"}}
                        ]
                    }
                },
                "filter": {
                    "and": [{
                        "numeric_range": {
                            "date": {"from": "2000-12-17T06:55:00Z", "to": "2001-12-17T06:55:00Z"}
                        }
                    },{
                        "terms": {
                            "perms": [70008497, 70008496, 70008495, ..., 70000166, 70000170, 70000002]
                        }
                    }]
                }
            }
        }
    }


We didn't use a cache key but that would obviously help reduce cache sizes. These were very contrived tests.



On Tuesday, February 12, 2013 5:40:45 PM UTC-5, kimchy wrote:
One more thing to add to clint answer, terms filter and date range internally always execute the "full" filter and end up being represented as a bitset. Even with 100 terms in a terms filter, this should be fast, and filter caching, specifically for ACL type logic, is nicely cached. One thing that I would add, is use _cache_key for the ACL filter, so the big list of terms won't be used as the relevant filter cache key, the _cache_key can be something like _user_id_1122_acl.

On Feb 8, 2013, at 11:19 AM, Clinton Gormley <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="DGQ7g2SgTBcJ">cl...@...> wrote:

> On Thu, 2013-02-07 at 21:47 -0800, Brian Jones wrote:
>> I would also really like to know the answer to this question.
>>
>>
>> Additionally, I'm wondering if the query can be run before the filters
>> or vice versa?  Will this impact performance?  Does Elasticsearch have
>> built in logic to optimize queries independent of their order in a
>> data request?  If we can control the order in which pieces of a
>> query / filter are executed and they do impact performance, then
>> please give an implementation example.
>
> Filters are executed in the order they are passed in to an and/or or
> must/should clause. must clauses are executed before should clauses
> (this goes for filters and queries)
>
> also, in the next version of ES, "cheap" (ie bitset) "should" filter
> clauses are executed before the more expensive filter clauses (eg
> geo-distance).
>
> in a filtered query, i believe the filter and query are executed
> together, ie filter->query->filter->query etc and in the next version,
> you'll be able to control the order of execution.
>
> in the search API, "filter" is executed after "query", (and after
> facets).
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="DGQ7g2SgTBcJ">elasticsearc...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.