Querystring search: Tokens are out of order

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Querystring search: Tokens are out of order

Dave Reed
I have the following search:

{
 
"query": {
   
"filtered": {
     
"query": {
       
"query_string": {
         
"default_operator": "AND",
         
"query": "details:foo\\-bar"
       
}
     
},
     
"filter": {
       
"term": {
         
"deleted": false
       
}
     
}
   
}
 
}
}



The details field is analyzed using pattern tokenizer, as so:

settings: {
  index
.analysis.analyzer.letterordigit.pattern: "[^\\p{L}\\p{N}]+",
  index
.analysis.analyzer.letterordigit.type: "pattern"
}


This breaks the field into tokens separated by any non-letter or non-numeric character. 

But the user is searching for "foo-bar" which contains a non alphanumeric character. I assume, but correct me if I'm wrong, that ES will apply the same analyzer to that string. So it is broken into two tokens: ["foo", "bar"], and then the default_operator kicks in and essentially turns the query into "details:foo AND detail:bar".

My problem is that it will match documents containing "foo xyz bar" and "bar xyz foo" -- in the latter case, the tokens are in the reverse order from the user's search. I'm fine with it matching the former, but it's a stretch to convince the user that the latter is intended.

The search string is provided by the user, so I can't really build a complex query with different query types, hence the basic querystring search. 

Any advice or corrections to my assumptions is appreciated!


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4a204214-f209-48dd-a13a-96463609ad7d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Querystring search: Tokens are out of order

James Macdonald
You analysis of what is going on sounds correct. However, Elasticsearch's results are also correct. When it analyzes the search string, your query becomes a match query on "foo" AND "bar", which matches any document containing both of those terms. Most queries against analyzed fields do not respect the original ordering of the terms. 

One thing you could try is looking into the match_phrase query (http://www.elastic.co/guide/en/elasticsearch/guide/master/phrase-matching.html) which is aware of the ordering of the terms. Using the base match_phrase query for "foo bar" will not match either "foo xyz bar" or "bar xyz foo". If you still need to match things like  "foo xyz bar" you may be able to do that using the slop parameter, depending on what exactly the use case is. 

James

On Tue, Apr 14, 2015 at 2:03 PM, Dave Reed <[hidden email]> wrote:
I have the following search:

{
 
"query": {
   
"filtered": {
     
"query": {
       
"query_string": {
         
"default_operator": "AND",
         
"query": "details:foo\\-bar"
       
}
     
},
     
"filter": {
       
"term": {
         
"deleted": false
       
}
     
}
   
}
 
}
}



The details field is analyzed using pattern tokenizer, as so:

settings: {
  index
.analysis.analyzer.letterordigit.pattern: "[^\\p{L}\\p{N}]+",
  index
.analysis.analyzer.letterordigit.type: "pattern"
}


This breaks the field into tokens separated by any non-letter or non-numeric character. 

But the user is searching for "foo-bar" which contains a non alphanumeric character. I assume, but correct me if I'm wrong, that ES will apply the same analyzer to that string. So it is broken into two tokens: ["foo", "bar"], and then the default_operator kicks in and essentially turns the query into "details:foo AND detail:bar".

My problem is that it will match documents containing "foo xyz bar" and "bar xyz foo" -- in the latter case, the tokens are in the reverse order from the user's search. I'm fine with it matching the former, but it's a stretch to convince the user that the latter is intended.

The search string is provided by the user, so I can't really build a complex query with different query types, hence the basic querystring search. 

Any advice or corrections to my assumptions is appreciated!


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4a204214-f209-48dd-a13a-96463609ad7d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAABsnTZWNp65WzwYsZVZz%3DiHon7WW90EO8SUKbnB4aHuKcd-og%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Querystring search: Tokens are out of order

Dave Reed
Thanks, though unless I am misunderstanding it, the docs imply otherwise:

For example, from:
http://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html

The query string is parsed into a series of terms and operators. A term can be a single word — quick or brown — or a phrase, surrounded by double quotes — "quick brown" — which searches for all the words in the phrase, in the same order.

So what gives? :)

On Tuesday, April 14, 2015 at 1:15:24 PM UTC-7, James Macdonald wrote:
You analysis of what is going on sounds correct. However, Elasticsearch's results are also correct. When it analyzes the search string, your query becomes a match query on "foo" AND "bar", which matches any document containing both of those terms. Most queries against analyzed fields do not respect the original ordering of the terms. 

One thing you could try is looking into the match_phrase query (<a href="http://www.elastic.co/guide/en/elasticsearch/guide/master/phrase-matching.html" target="_blank" rel="nofollow" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.elastic.co%2Fguide%2Fen%2Felasticsearch%2Fguide%2Fmaster%2Fphrase-matching.html\46sa\75D\46sntz\0751\46usg\75AFQjCNEaOrIcHBwT_ZTfMYBxfxx8teJW6g';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.elastic.co%2Fguide%2Fen%2Felasticsearch%2Fguide%2Fmaster%2Fphrase-matching.html\46sa\75D\46sntz\0751\46usg\75AFQjCNEaOrIcHBwT_ZTfMYBxfxx8teJW6g';return true;">http://www.elastic.co/guide/en/elasticsearch/guide/master/phrase-matching.html) which is aware of the ordering of the terms. Using the base match_phrase query for "foo bar" will not match either "foo xyz bar" or "bar xyz foo". If you still need to match things like  "foo xyz bar" you may be able to do that using the slop parameter, depending on what exactly the use case is. 

James

On Tue, Apr 14, 2015 at 2:03 PM, Dave Reed <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="cp446umnC-YJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">infin...@...> wrote:
I have the following search:

{
 
"query": {
   
"filtered": {
     
"query": {
       
"query_string": {
         
"default_operator": "AND",
         
"query": "details:foo\\-bar"
       
}
     
},
     
"filter": {
       
"term": {
         
"deleted": false
       
}
     
}
   
}
 
}
}



The details field is analyzed using pattern tokenizer, as so:

settings: {
  index
.analysis.analyzer.letterordigit.pattern: "[^\\p{L}\\p{N}]+",
  index
.analysis.analyzer.letterordigit.type: "pattern"
}


This breaks the field into tokens separated by any non-letter or non-numeric character. 

But the user is searching for "foo-bar" which contains a non alphanumeric character. I assume, but correct me if I'm wrong, that ES will apply the same analyzer to that string. So it is broken into two tokens: ["foo", "bar"], and then the default_operator kicks in and essentially turns the query into "details:foo AND detail:bar".

My problem is that it will match documents containing "foo xyz bar" and "bar xyz foo" -- in the latter case, the tokens are in the reverse order from the user's search. I'm fine with it matching the former, but it's a stretch to convince the user that the latter is intended.

The search string is provided by the user, so I can't really build a complex query with different query types, hence the basic querystring search. 

Any advice or corrections to my assumptions is appreciated!


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="cp446umnC-YJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/4a204214-f209-48dd-a13a-96463609ad7d%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/4a204214-f209-48dd-a13a-96463609ad7d%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/4a204214-f209-48dd-a13a-96463609ad7d%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/4a204214-f209-48dd-a13a-96463609ad7d%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7a355b94-358f-4c5a-ac16-31ac7a0c0abe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Querystring search: Tokens are out of order

Dave Reed
To perhaps answer my own question, I think I understand the difference.

details:"foo bar"

Would search for the tokens in the same order (implied  by the docs I referenced). But 

details:foo-bar

Would not honor the order. The quotes have more meaning than to enclose the phrase... if that is true then these two queries are not the same, which is different than I thought:

details:foo\ bar
!=
details:"foo bar"

Or am I barking up the wrong tree...

On Tuesday, April 14, 2015 at 1:34:28 PM UTC-7, Dave Reed wrote:
Thanks, though unless I am misunderstanding it, the docs imply otherwise:

For example, from:
<a href="http://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html" target="_blank" rel="nofollow" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.elastic.co%2Fguide%2Fen%2Felasticsearch%2Freference%2Fcurrent%2Fquery-dsl-query-string-query.html\46sa\75D\46sntz\0751\46usg\75AFQjCNEIvXJp6MA3hYVBNXw5s5bKDejGIA';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.elastic.co%2Fguide%2Fen%2Felasticsearch%2Freference%2Fcurrent%2Fquery-dsl-query-string-query.html\46sa\75D\46sntz\0751\46usg\75AFQjCNEIvXJp6MA3hYVBNXw5s5bKDejGIA';return true;">http://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html

The query string is parsed into a series of terms and operators. A term can be a single word — quick or brown — or a phrase, surrounded by double quotes — "quick brown" — which searches for all the words in the phrase, in the same order.

So what gives? :)


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b28591e3-3818-4b12-8a22-cac466c9ec7c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Querystring search: Tokens are out of order

Ivan Brusic

You understanding is correct. The former will be translated into a Lucene phrase query, which uses the term doc positions to find matches.

Both query terms are analyzed, but the latter will simply be a bag-of-words query, which ignores positions.

Cheers,

Ivan

On Apr 14, 2015 10:38 PM, "Dave Reed" <[hidden email]> wrote:
To perhaps answer my own question, I think I understand the difference.

details:"foo bar"

Would search for the tokens in the same order (implied  by the docs I referenced). But 

details:foo-bar

Would not honor the order. The quotes have more meaning than to enclose the phrase... if that is true then these two queries are not the same, which is different than I thought:

details:foo\ bar
!=
details:"foo bar"

Or am I barking up the wrong tree...

On Tuesday, April 14, 2015 at 1:34:28 PM UTC-7, Dave Reed wrote:
Thanks, though unless I am misunderstanding it, the docs imply otherwise:

For example, from:
http://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html

The query string is parsed into a series of terms and operators. A term can be a single word — quick or brown — or a phrase, surrounded by double quotes — "quick brown" — which searches for all the words in the phrase, in the same order.

So what gives? :)


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b28591e3-3818-4b12-8a22-cac466c9ec7c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBZOjqZ6xU8Y2%3Dh6BmBWOqms53yrix5eJsWXq9E6meYbg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.