slow performance on phrase queries in should clause

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

slow performance on phrase queries in should clause

Kireet Reddy
Our system is normally very responsive, but very occasionally people submit long phrase queries which timeout and cause high system load. Not all long phrase queries cause issues, but I have been debugging one that I've found.[1]

The query is in the filter section of a constant score query as below. This form times out. However if I move the query out of the should section and into the must section, the query runs very quickly (in the full query, there was another filter in the should section). Converting this to an AND filter is also fast. Is there a reason for this? Are should filters executed on the full set and not short circuited with the results of must filters?

{

    "query": {

        "constant_score": {

            "filter": {

                "bool": {

                    "must": { "terms": { -- selective terms filter.... -- }  },

                    "should": { "query": { "match": { "text": { "query": "…", "type": "phrase" } } } }

                }

            }

        }

    }

}






[1] query -- ぶ新サービスは2015年春にリリースの予定。IoTのハードウェアそのものではなく、SDKやデータベース、解析、IDといったバックグラウンド環境をサービスとして提供するというものだ。発表後、松本氏は「例えばイケてる時計型のプロダクトを作ったとして、(機能面では)単体での価値は1〜2割だったりする。でも本当に重要なのはバックエンド。しかしユーザーから見てみれば時計というプロダクトそのものに大きな価値を感じることが多い。そうであれば、IoTのバックエンドをBaaS(Backend as a Service:ユーザーの登録や管理、データ保管といったバックエンド環境をサービスとして提供すること)のように提供できればプロダクトの開発に集中できると思う。クラウドが出てネットサービスの開発が手軽になったのと同じような環境を提供したい」とサービスについて語ってくれた。 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5b1e6260-5c19-4ac7-bf1e-939360bf509e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: slow performance on phrase queries in should clause

Michael McCandless-3
It's likely the should is (stupidly) being fully expanded before being AND'd with the must ... but there are improvements here (XBooleanFilter.java) to this in master, are you able to test and see if it's still slow?


2014-12-04 19:21 GMT-05:00 Kireet Reddy <[hidden email]>:
Our system is normally very responsive, but very occasionally people submit long phrase queries which timeout and cause high system load. Not all long phrase queries cause issues, but I have been debugging one that I've found.[1]

The query is in the filter section of a constant score query as below. This form times out. However if I move the query out of the should section and into the must section, the query runs very quickly (in the full query, there was another filter in the should section). Converting this to an AND filter is also fast. Is there a reason for this? Are should filters executed on the full set and not short circuited with the results of must filters?

{

    "query": {

        "constant_score": {

            "filter": {

                "bool": {

                    "must": { "terms": { -- selective terms filter.... -- }  },

                    "should": { "query": { "match": { "text": { "query": "…", "type": "phrase" } } } }

                }

            }

        }

    }

}






[1] query -- ぶ新サービスは2015年春にリリースの予定。IoTのハードウェアそのものではなく、SDKやデータベース、解析、IDといったバックグラウンド環境をサービスとして提供するというものだ。発表後、松本氏は「例えばイケてる時計型のプロダクトを作ったとして、(機能面では)単体での価値は1〜2割だったりする。でも本当に重要なのはバックエンド。しかしユーザーから見てみれば時計というプロダクトそのものに大きな価値を感じることが多い。そうであれば、IoTのバックエンドをBaaS(Backend as a Service:ユーザーの登録や管理、データ保管といったバックエンド環境をサービスとして提供すること)のように提供できればプロダクトの開発に集中できると思う。クラウドが出てネットサービスの開発が手軽になったのと同じような環境を提供したい」とサービスについて語ってくれた。 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5b1e6260-5c19-4ac7-bf1e-939360bf509e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRci9T%2BEQrXLS2rH1L1hhNVPmsQXCkHxQretAfEuo3RAYw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: slow performance on phrase queries in should clause

Kireet Reddy
I spent some more time debugging this yesterday, and it started driving me a little crazy. I thought to test my theory, I should reduce the number of terms in my must filter from ~ 100, to 1. If the should was executing over all documents, the query should remain slow. But it ended up executing quickly! So I am a little lost as to what's going on. Does elasticsearch/lucene use any heuristics about which clause to execute first that might cause this? I am using 1.3.5.

I'll ask our ops guys about seeing if we can setup an installation of the master branch and see if there's any improvement. Would I need to change the query at all? In the meantime, is there anything I can do on the 1.3 branch? Should I split off should clauses into a separate bool filter and wrap it in an and? I.e. 
AND of
  + bool filters with selective terms filter
  + bool filters with must filters

Also, I've run into a few of there performance issues, it would have been really helpful if there was something like an explain plan for database queries, or if I could set an explain type option on the query and it would collect performance info at each step while processing the query and send it back with the results. Right now it's really kind of a black box for me, especially with caching kicking in at times. Has there ever been any thought about implementing something like this in lucene/elasticsearch?

Thanks
Kireet

On Friday, December 5, 2014 3:12:49 AM UTC-8, Michael McCandless wrote:
It's likely the should is (stupidly) being fully expanded before being AND'd with the must ... but there are improvements here (XBooleanFilter.java) to this in master, are you able to test and see if it's still slow?

Mike McCandless

<a href="http://blog.mikemccandless.com" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fblog.mikemccandless.com\46sa\75D\46sntz\0751\46usg\75AFQjCNFjnnkm_ueg0Fg94ruGXIocQPNZ-Q';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fblog.mikemccandless.com\46sa\75D\46sntz\0751\46usg\75AFQjCNFjnnkm_ueg0Fg94ruGXIocQPNZ-Q';return true;">http://blog.mikemccandless.com

2014-12-04 19:21 GMT-05:00 Kireet Reddy <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="5a0A9cc9mRwJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">kir...@...>:
Our system is normally very responsive, but very occasionally people submit long phrase queries which timeout and cause high system load. Not all long phrase queries cause issues, but I have been debugging one that I've found.[1]

The query is in the filter section of a constant score query as below. This form times out. However if I move the query out of the should section and into the must section, the query runs very quickly (in the full query, there was another filter in the should section). Converting this to an AND filter is also fast. Is there a reason for this? Are should filters executed on the full set and not short circuited with the results of must filters?

{

    "query": {

        "constant_score": {

            "filter": {

                "bool": {

                    "must": { "terms": { -- selective terms filter.... -- }  },

                    "should": { "query": { "match": { "text": { "query": "…", "type": "phrase" } } } }

                }

            }

        }

    }

}






[1] query -- ぶ新サービスは2015年春にリリースの予定。IoTのハードウェアそのものではなく、SDKやデータベース、解析、IDといったバックグラウンド環境をサービスとして提供するというものだ。発表後、松本氏は「例えばイケてる時計型のプロダクトを作ったとして、(機能面では)単体での価値は1〜2割だったりする。でも本当に重要なのはバックエンド。しかしユーザーから見てみれば時計というプロダクトそのものに大きな価値を感じることが多い。そうであれば、IoTのバックエンドをBaaS(Backend as a Service:ユーザーの登録や管理、データ保管といったバックエンド環境をサービスとして提供すること)のように提供できればプロダクトの開発に集中できると思う。クラウドが出てネットサービスの開発が手軽になったのと同じような環境を提供したい」とサービスについて語ってくれた。 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="5a0A9cc9mRwJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/5b1e6260-5c19-4ac7-bf1e-939360bf509e%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/5b1e6260-5c19-4ac7-bf1e-939360bf509e%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/5b1e6260-5c19-4ac7-bf1e-939360bf509e%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/5b1e6260-5c19-4ac7-bf1e-939360bf509e%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4f9ba758-0895-433f-b7f3-d27d9ef8627c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: slow performance on phrase queries in should clause

InquiringMind
Just a wild guess here, but do the slow phrase queries contain duplicates? For example,

"the the the the the"

Again, just a guess based on some past experience with another engine. Duplicate words in a phrase query would cause a significant slowdown even with a tiny database on a locally hosted blazingly fast machine. That was its only weak point, but it was a significant weak point.

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/60f34a78-bb6d-49f4-b533-f6cd030eb22f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.