Quantcast

Default match_all behavior for match query with no tokens after analysis

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Default match_all behavior for match query with no tokens after analysis

John Daniels
Hi all!

Currently I am working on an application which allows users to search on multiple different fields using our own query language. This sometimes requires us to combine a search for analyzed text with another query using an "and" operator. For example, someone could search for:
some text AND color:green

We normally do this by combining a match query with another query for the color using a boolean query. This is fine, but if "some text" consists of only stop words, then we will get no results. For example, if someone searches for:
is AND color:green
then we will get no results. While we can't do anything useful with a term that only contains stop words, we would rather turn it into a match_all rather than a match_none. Currently, a match query with only stop words yields a Lucene BooleanQuery with no terms, which will never match any documents. In an ideal world for us, we would like a query that when it receives no tokens from the analyzer yields a Lucene MatchAllDocs query. This can for example be achieved by running a field query with the following query text:
+({possible stopword text here}) *
Unfortunately, this seems like somewhat of a hack and I'd rather not construct Lucene query strings that are just going to be immediately parsed if I can avoid it. I was wondering if there is some better way to get similar semantics by directly using ElasticSearch queries. It's not a super big deal but it would be nice to be able to implement behavior similar to what Lucene query strings already do, but with different semantics for the application.

So does anyone have any recommendations for making such a query without using a Lucene query string? Or is that the best way to do this?

Thanks!

--
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Default match_all behavior for match query with no tokens after analysis

Chris Male
Hi John,

This problem comes up quite a lot but unfortunately there isn't many options at this moment about what you can do.  MatchQuery (the Query produced by the 'match' query type) currently has hardcoded behaviour for what to do when analysis strips all the terms from the input.  

However I have opened https://github.com/elasticsearch/elasticsearch/issues/2429 to change this so you can provide a flag of what to do in this situation.

On Thursday, November 22, 2012 5:37:48 AM UTC+13, John Daniels wrote:
Hi all!

Currently I am working on an application which allows users to search on multiple different fields using our own query language. This sometimes requires us to combine a search for analyzed text with another query using an "and" operator. For example, someone could search for:
some text AND color:green

We normally do this by combining a match query with another query for the color using a boolean query. This is fine, but if "some text" consists of only stop words, then we will get no results. For example, if someone searches for:
is AND color:green
then we will get no results. While we can't do anything useful with a term that only contains stop words, we would rather turn it into a match_all rather than a match_none. Currently, a match query with only stop words yields a Lucene BooleanQuery with no terms, which will never match any documents. In an ideal world for us, we would like a query that when it receives no tokens from the analyzer yields a Lucene MatchAllDocs query. This can for example be achieved by running a field query with the following query text:
+({possible stopword text here}) *
Unfortunately, this seems like somewhat of a hack and I'd rather not construct Lucene query strings that are just going to be immediately parsed if I can avoid it. I was wondering if there is some better way to get similar semantics by directly using ElasticSearch queries. It's not a super big deal but it would be nice to be able to implement behavior similar to what Lucene query strings already do, but with different semantics for the application.

So does anyone have any recommendations for making such a query without using a Lucene query string? Or is that the best way to do this?

Thanks!

--
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Default match_all behavior for match query with no tokens after analysis

sulemanmubarik
This post has NOT been accepted by the mailing list yet.
Hi
i have a question why ZeroTermsQuery is added only in MatchQuery why it is not added in QueryStringQuery or in MultiMatchQuery.
i am using MultiMatchQuery and i have the same problem. i can change it to MatchQuery because i have to add multiparty fields .or is there a way how i can add multiparty fields with MatchQuery
thanks
Loading...