Boolean operators entered in lower case returning unexpected results

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Boolean operators entered in lower case returning unexpected results

kranti.vns
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Boolean operators entered in lower case returning unexpected results

kimchy
Administrator
Its not because and is a boolean operation, its because its a stop word, and its removed during the analysis process. So the text: "dog and cat" will be broken down into terms: "dog", "cat" by the standard analyzed. You can build your own analyzed that does not remove stopwords.

On Tue, Dec 13, 2011 at 1:45 PM, kranti.vns <[hidden email]> wrote:
Hi Team,

We are facing an issue when query string consists of Boolean operators (AND,
OR and NOT) in lower case.For e. g. when following query is fired on ES
server we are getting back the results even when there is no such data
available.

Query :

curl -XGET http://localhost:9200/_search?pretty=1 -d '{
 "from" : 0,
 "size" : 5,
 "query" : {
   "filtered" : {
     "query" : {
       "query_string" : {
         "query" : "message:snort message:and",
         "default_operator" : "and",
         "analyzer" : "search_analyzer",
         "allow_leading_wildcard" : false,
         "analyze_wildcard" : true
       }
     },
     "filter" : {
       "range" : {
         "date" : {
           "from" : "2011/08/01 00:00:00",
           "to" : "2011/08/15 23:59:59",
           "include_lower" : true,
           "include_upper" : false
         }
       }
     }
   }
 },
 "explain" : true,
 "sort" : [ {
   "date" : {
     "order" : "desc"
   }
 } ]
}'

As can be seen that default operator is AND.We don't have any message which
has both "snort" and "and", still we are getting back the result.Following
is the explain obtained :

{
 "took" : 348,
 "timed_out" : false,
 "_shards" : {
   "total" : 5,
   "successful" : 5,
   "failed" : 0
 },
 "hits" : {
   "total" : 4692830,
   "max_score" : null,
   "hits" : [ {
     "_shard" : 0,
     "_node" : "fvTUBsH4SRWZlAiw6ZSr_Q",
     "_index" : "20110812",
     "_type" : "syslog",
     "_id" : "1313209533-195322",
     "_score" : null, "_source" : { "date" : "2011/08/12 23:59:58",  "host"
: "hallmark-snrt-008",  "message" : "Aug 12 23:59:58 hallmark-snrt-008
snort[14387]: [1:402:8] ICMP Destination Unreachable Port Unreachable
[Classification: Misc activity] [Priority: 3] {ICMP} 10.27.32.46 ->
172.17.1.26",  "service" : "snort",  "sip" : "10.27.32.46",  "dip" :
"172.17.1.26",  "sigid" : "1:402",  "signame" : "ICMP Destination
Unreachable Port Unreachable",  "app" : "Snort IDS",  "ips-severity" : "3",
"ips-category" : "Misc activity"  },
     "sort" : [ 1313193598000 ],
     "_explanation" : {
       "value" : 1.0179296,
       "description" : "fieldWeight(message:snort in 4660), product of:",
       "details" : [ {
         "value" : 1.0,
         "description" : "tf(termFreq(message:snort)=1)"
       }, {
         "value" : 1.0179296,
         "description" : "idf(docFreq=98223, maxDocs=100001)"
       }, {
         "value" : 1.0,
         "description" : "fieldNorm(field=message, doc=4660)"
       } ]
     }
   }, {
     "_shard" : 0,
     "_node" : "fvTUBsH4SRWZlAiw6ZSr_Q",
     "_index" : "20110812",
     "_type" : "syslog",
     "_id" : "1313209533-195324",
     "_score" : null, "_source" : { "date" : "2011/08/12 23:59:58",  "host"
: "hallmark-snrt-008",  "message" : "Aug 12 23:59:58 hallmark-snrt-008
snort[14387]: [1:402:8] ICMP Destination Unreachable Port Unreachable
[Classification: Misc activity] [Priority: 3] {ICMP} 10.27.32.46 ->
172.17.1.26",  "service" : "snort",  "sip" : "10.27.32.46",  "dip" :
"172.17.1.26",  "sigid" : "1:402",  "signame" : "ICMP Destination
Unreachable Port Unreachable",  "app" : "Snort IDS",  "ips-severity" : "3",
"ips-category" : "Misc activity"  },
     "sort" : [ 1313193598000 ],
     "_explanation" : {
       "value" : 1.0179296,
       "description" : "fieldWeight(message:snort in 4661), product of:",
       "details" : [ {
         "value" : 1.0,
         "description" : "tf(termFreq(message:snort)=1)"
       }, {
         "value" : 1.0179296,
         "description" : "idf(docFreq=98223, maxDocs=100001)"
       }, {
         "value" : 1.0,
         "description" : "fieldNorm(field=message, doc=4661)"
       } ]
     }
   }, {
     "_shard" : 0,
     "_node" : "fvTUBsH4SRWZlAiw6ZSr_Q",
     "_index" : "20110812",
     "_type" : "syslog",
     "_id" : "1313209533-195318",
     "_score" : null, "_source" : { "date" : "2011/08/12 23:59:56",  "host"
: "hallmark-snrt-008",  "message" : "Aug 12 23:59:56 hallmark-snrt-008
snort[14387]: [1:254:9] DNS SPOOF query response with TTL of 1 min. and no
authority [Classification: Potentially Bad Traffic] [Priority: 2] {UDP}
162.94.3.251:53 -> 192.168.40.40:58602",  "service" : "snort",  "sip" :
"162.94.3.251",  "dip" : "192.168.40.40",  "dport" : "58602",  "sigid" :
"1:254",  "signame" : "DNS SPOOF query response with TTL of 1 min. and no
authority",  "app" : "Snort IDS",  "ips-severity" : "2",  "ips-category" :
"Potentially Bad Traffic"  },
     "sort" : [ 1313193596000 ],
     "_explanation" : {
       "value" : 1.0179296,
       "description" : "fieldWeight(message:snort in 4658), product of:",
       "details" : [ {
         "value" : 1.0,
         "description" : "tf(termFreq(message:snort)=1)"
       }, {
         "value" : 1.0179296,
         "description" : "idf(docFreq=98223, maxDocs=100001)"
       }, {
         "value" : 1.0,
         "description" : "fieldNorm(field=message, doc=4658)"
       } ]
     }
   }, {
     "_shard" : 0,
     "_node" : "fvTUBsH4SRWZlAiw6ZSr_Q",
     "_index" : "20110812",
     "_type" : "syslog",
     "_id" : "1313209533-195320",
     "_score" : null, "_source" : { "date" : "2011/08/12 23:59:56",  "host"
: "hallmark-snrt-008",  "message" : "Aug 12 23:59:56 hallmark-snrt-008
snort[14387]: [1:254:9] DNS SPOOF query response with TTL of 1 min. and no
authority [Classification: Potentially Bad Traffic] [Priority: 2] {UDP}
162.94.3.251:53 -> 192.168.40.40:58602",  "service" : "snort",  "sip" :
"162.94.3.251",  "dip" : "192.168.40.40",  "dport" : "58602",  "sigid" :
"1:254",  "signame" : "DNS SPOOF query response with TTL of 1 min. and no
authority",  "app" : "Snort IDS",  "ips-severity" : "2",  "ips-category" :
"Potentially Bad Traffic"  },
     "sort" : [ 1313193596000 ],
     "_explanation" : {
       "value" : 1.0179296,
       "description" : "fieldWeight(message:snort in 4659), product of:",
       "details" : [ {
         "value" : 1.0,
         "description" : "tf(termFreq(message:snort)=1)"
       }, {
         "value" : 1.0179296,
         "description" : "idf(docFreq=98223, maxDocs=100001)"
       }, {
         "value" : 1.0,
         "description" : "fieldNorm(field=message, doc=4659)"
       } ]
     }
   }, {
     "_shard" : 0,
     "_node" : "fvTUBsH4SRWZlAiw6ZSr_Q",
     "_index" : "20110812",
     "_type" : "syslog",
     "_id" : "1313209533-195310",
     "_score" : null, "_source" : { "date" : "2011/08/12 23:59:55",  "host"
: "hallmark-snrt-008",  "message" : "Aug 12 23:59:55 hallmark-snrt-008
snort[14387]: [1:384:5] ICMP PING [Classification: Misc activity] [Priority:
3] {ICMP} 192.168.114.189 -> 10.15.1.103",  "service" : "snort",  "sip" :
"192.168.114.189",  "dip" : "10.15.1.103",  "sigid" : "1:384",  "signame" :
"ICMP PING",  "app" : "Snort IDS",  "ips-severity" : "3",  "ips-category" :
"Misc activity"  },
     "sort" : [ 1313193595000 ],
     "_explanation" : {
       "value" : 1.0179296,
       "description" : "fieldWeight(message:snort in 4654), product of:",
       "details" : [ {
         "value" : 1.0,
         "description" : "tf(termFreq(message:snort)=1)"
       }, {
         "value" : 1.0179296,
         "description" : "idf(docFreq=98223, maxDocs=100001)"
       }, {
         "value" : 1.0,
         "description" : "fieldNorm(field=message, doc=4654)"
       } ]
     }
   } ]
 }

As it can be seen that it is not accounting for "and", only for "snort".

Please let us know the cause for such unexpected behavior.

Same behavior is seen when we entered query like following which look for
literal string "snort AND"

curl -XGET http://localhost:9200/_search?pretty=1 -d '{
 "from" : 0,
 "size" : 2,
 "query" : {
   "filtered" : {
     "query" : {
       "query_string" : {
         "query" : "\"snort AND\"",
         "default_operator" : "and",
         "analyzer" : "search_analyzer",
         "allow_leading_wildcard" : false,
         "analyze_wildcard" : true
       }
     },
     "filter" : {
       "range" : {
         "date" : {
           "from" : "2011/08/01 00:00:00",
           "to" : "2011/08/15 23:59:59",
           "include_lower" : true,
           "include_upper" : false
         }
       }
     }
   }
 },
 "explain" : true,
 "sort" : [ {
   "date" : {
     "order" : "desc"
   }
 } ]
}'

The explain is :

{
 "took" : 1029,
 "timed_out" : false,
 "_shards" : {
   "total" : 5,
   "successful" : 5,
   "failed" : 0
 },
 "hits" : {
   "total" : 4692830,
   "max_score" : null,
   "hits" : [ {
     "_shard" : 0,
     "_node" : "fvTUBsH4SRWZlAiw6ZSr_Q",
     "_index" : "20110812",
     "_type" : "syslog",
     "_id" : "1313209533-195322",
     "_score" : null, "_source" : { "date" : "2011/08/12 23:59:58",  "host"
: "hallmark-snrt-008",  "message" : "Aug 12 23:59:58 hallmark-snrt-008
snort[14387]: [1:402:8] ICMP Destination Unreachable Port Unreachable
[Classification: Misc activity] [Priority: 3] {ICMP} 10.27.32.46 ->
172.17.1.26",  "service" : "snort",  "sip" : "10.27.32.46",  "dip" :
"172.17.1.26",  "sigid" : "1:402",  "signame" : "ICMP Destination
Unreachable Port Unreachable",  "app" : "Snort IDS",  "ips-severity" : "3",
"ips-category" : "Misc activity"  },
     "sort" : [ 1313193598000 ],
     "_explanation" : {
       "value" : 0.15583801,
       "description" : "fieldWeight(_all:snort in 4660), product of:",
       "details" : [ {
         "value" : 1.2247449,
         "description" : "btq, product of:",
         "details" : [ {
           "value" : 1.2247449,
           "description" : "tf(phraseFreq=1.5)"
         }, {
           "value" : 1.0,
           "description" : "allPayload(...)"
         } ]
       }, {
         "value" : 1.0179296,
         "description" : "idf(_all:  snort=98223)"
       }, {
         "value" : 0.125,
         "description" : "fieldNorm(field=_all, doc=4660)"
       } ]
     }
   }, {
     "_shard" : 0,
     "_node" : "fvTUBsH4SRWZlAiw6ZSr_Q",
     "_index" : "20110812",
     "_type" : "syslog",
     "_id" : "1313209533-195324",
     "_score" : null, "_source" : { "date" : "2011/08/12 23:59:58",  "host"
: "hallmark-snrt-008",  "message" : "Aug 12 23:59:58 hallmark-snrt-008
snort[14387]: [1:402:8] ICMP Destination Unreachable Port Unreachable
[Classification: Misc activity] [Priority: 3] {ICMP} 10.27.32.46 ->
172.17.1.26",  "service" : "snort",  "sip" : "10.27.32.46",  "dip" :
"172.17.1.26",  "sigid" : "1:402",  "signame" : "ICMP Destination
Unreachable Port Unreachable",  "app" : "Snort IDS",  "ips-severity" : "3",
"ips-category" : "Misc activity"  },
     "sort" : [ 1313193598000 ],
     "_explanation" : {
       "value" : 0.15583801,
       "description" : "fieldWeight(_all:snort in 4661), product of:",
       "details" : [ {
         "value" : 1.2247449,
         "description" : "btq, product of:",
         "details" : [ {
           "value" : 1.2247449,
           "description" : "tf(phraseFreq=1.5)"
         }, {
           "value" : 1.0,
           "description" : "allPayload(...)"
         } ]
       }, {
         "value" : 1.0179296,
         "description" : "idf(_all:  snort=98223)"
       }, {
         "value" : 0.125,
         "description" : "fieldNorm(field=_all, doc=4661)"
       } ]
     }
   } ]
 }

There is nothing like exact literal string "snort AND" in message, still we
are getting back the data.The "AND" is completely overlooked as can be seen
from the explain.

If we search something like "snort abc" we are not getting back data as
there is no such literal string.Its only when we enter something like "snot
and" or "snort not" we are getting back data.

Please suggest how to handle lower case Boolean operators "and", "or" and
"not".Also how should we handle Boolean operators "AND","OR" and "NOT" when
they are part of literal string.

Thanks
Kranti



--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Boolean-operators-entered-in-lower-case-returning-unexpected-results-tp3582092p3582092.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Boolean operators entered in lower case returning unexpected results

ElasticUsers
Hi Shay,

I am using ES Java API for searching.Please let me know how can I specify analyzer while searching using Java API.

Also it would be a great help if you could tell how and where should I specify analyzer while creating indices.

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: Boolean operators entered in lower case returning unexpected results

ElasticUsers
Any inputs/suggestions on this?
Reply | Threaded
Open this post in threaded view
|

Re: Boolean operators entered in lower case returning unexpected results

kimchy
Administrator
In reply to this post by ElasticUsers
First, don't send the mail on another thread, it has nothing to do with this thread. You can specify an analyzer when you search based on the query type that you use. For example, query_string accepts an analyzer option.

You can specify a default analyzer in your index config, which will be used for all (string) fields by default. you, you can specify an analyzer associated with a field in the mapping.

On Fri, Jan 6, 2012 at 9:25 AM, ElasticUsers <[hidden email]> wrote:
Hi Shay,

I am using ES Java API for searching.Please let me know how can I specify
analyzer while searching using Java API.

Also it would be a great help if you could tell how and where should I
specify analyzer while creating indices.

Thanks

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Boolean-operators-entered-in-lower-case-returning-unexpected-results-tp3582092p3637252.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.