How to get Elasticsearch boolean match working for multiple fields

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

How to get Elasticsearch boolean match working for multiple fields

Dominic Nicholas

Hi,

I need some expert guidance on trying to get a bool match working. I'd like the query to only return a successful search result if both 'message' matches 'Failed password for', and 'path' matches '/var/log/secure'.

This is my query :

curl -s -XGET 'http://localhost:9200/logstash-2015.05.07/syslog/_search?pretty=true' -d '{
    "filter" : { "range" : { "@timestamp" : { "gte" : "now-1h" } } },
    "query" : {
        "bool" : {
            "must" : [
                {  "match_phrase" : { "message" : "Failed password for" } },
                {  "match_phrase" : { "path"    : "/var/log/secure"     } }
            ]
        }
    }
} '

Here is the start of the output from the search :

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 46,
    "max_score" : 13.308596,
    "hits" : [ {
      "_index" : "logstash-2015.05.07",
      "_type" : "syslog",
      "_id" : "AU0wzLEqqCKq_IPSp_8k",
      "_score" : 13.308596,
      "_source":{"message":"May  7 16:53:50 s_local@logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","@version":"1","@timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
    }, ...

The problem is if I change '/var/log/secure' to just 'var' say, and run the query, I still get a result, just with a lower score. I understood the bool...must construct meant both match terms here would need to be successful. What I'm after is no result if 'path' doesn't exactly match '/var/log/secure'...

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 46,
    "max_score" : 10.354593,
    "hits" : [ {
      "_index" : "logstash-2015.05.07",
      "_type" : "syslog",
      "_id" : "AU0wzLEqqCKq_IPSp_8k",
      "_score" : 10.354593,
      "_source":{"message":"May  7 16:53:50 s_local@logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","@version":"1","@timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
    },...

I checked the mappings for these fields to check that they are not analyzed :

curl -X GET 'http://localhost:9200/logstash-2015.05.07/_mapping?pretty=true'

I think these fields are non analyzed and so I believe the search will not be analyzed too (based on some training documentation I read recently from elasticsearch). Here is a snippet of the output _mapping for this index below.

      ....
      "message" : {
        "type" : "string",
        "norms" : {
          "enabled" : false
        },
        "fields" : {
          "raw" : {
            "type" : "string",
            "index" : "not_analyzed",
            "ignore_above" : 256
          }
        }
      },
      "path" : {
        "type" : "string",
        "norms" : {
          "enabled" : false
        },
        "fields" : {
          "raw" : {
            "type" : "string",
            "index" : "not_analyzed",
            "ignore_above" : 256
          }
        }
      },
      ....

Where am I going wrong (in a bunch of places I'm sure), what am I misunderstanding here (probably a lot!) ?

Any help would be much appreciated!

Thanks

--
Please update your bookmarks! We moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0470f9df-8d9a-48ef-9dbd-a90c8f2db194%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: How to get Elasticsearch boolean match working for multiple fields

Jason Wee
what es version is that?

On Fri, May 8, 2015 at 9:07 AM, Dominic Nicholas <[hidden email]> wrote:

Hi,

I need some expert guidance on trying to get a bool match working. I'd like the query to only return a successful search result if both 'message' matches 'Failed password for', and 'path' matches '/var/log/secure'.

This is my query :

curl -s -XGET 'http://localhost:9200/logstash-2015.05.07/syslog/_search?pretty=true' -d '{
    "filter" : { "range" : { "@timestamp" : { "gte" : "now-1h" } } },
    "query" : {
        "bool" : {
            "must" : [
                {  "match_phrase" : { "message" : "Failed password for" } },
                {  "match_phrase" : { "path"    : "/var/log/secure"     } }
            ]
        }
    }
} '

Here is the start of the output from the search :

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 46,
    "max_score" : 13.308596,
    "hits" : [ {
      "_index" : "logstash-2015.05.07",
      "_type" : "syslog",
      "_id" : "AU0wzLEqqCKq_IPSp_8k",
      "_score" : 13.308596,
      "_source":{"message":"May  7 16:53:50 s_local@logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","@version":"1","@timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
    }, ...

The problem is if I change '/var/log/secure' to just 'var' say, and run the query, I still get a result, just with a lower score. I understood the bool...must construct meant both match terms here would need to be successful. What I'm after is no result if 'path' doesn't exactly match '/var/log/secure'...

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 46,
    "max_score" : 10.354593,
    "hits" : [ {
      "_index" : "logstash-2015.05.07",
      "_type" : "syslog",
      "_id" : "AU0wzLEqqCKq_IPSp_8k",
      "_score" : 10.354593,
      "_source":{"message":"May  7 16:53:50 s_local@logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","@version":"1","@timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
    },...

I checked the mappings for these fields to check that they are not analyzed :

curl -X GET 'http://localhost:9200/logstash-2015.05.07/_mapping?pretty=true'

I think these fields are non analyzed and so I believe the search will not be analyzed too (based on some training documentation I read recently from elasticsearch). Here is a snippet of the output _mapping for this index below.

      ....
      "message" : {
        "type" : "string",
        "norms" : {
          "enabled" : false
        },
        "fields" : {
          "raw" : {
            "type" : "string",
            "index" : "not_analyzed",
            "ignore_above" : 256
          }
        }
      },
      "path" : {
        "type" : "string",
        "norms" : {
          "enabled" : false
        },
        "fields" : {
          "raw" : {
            "type" : "string",
            "index" : "not_analyzed",
            "ignore_above" : 256
          }
        }
      },
      ....

Where am I going wrong (in a bunch of places I'm sure), what am I misunderstanding here (probably a lot!) ?

Any help would be much appreciated!

Thanks

--
Please update your bookmarks! We moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0470f9df-8d9a-48ef-9dbd-a90c8f2db194%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHO4itwspZ96axDfyoLavndj2wzS_%2BV-UJha%2B893F5nzp%3DZYPA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: How to get Elasticsearch boolean match working for multiple fields

Dominic Nicholas
Hi - version 1.5.0 of es, 4.10.4 of lucene.

Dom

On Thu, May 7, 2015 at 11:24 PM, Jason Wee <[hidden email]> wrote:
what es version is that?

On Fri, May 8, 2015 at 9:07 AM, Dominic Nicholas <[hidden email]> wrote:

Hi,

I need some expert guidance on trying to get a bool match working. I'd like the query to only return a successful search result if both 'message' matches 'Failed password for', and 'path' matches '/var/log/secure'.

This is my query :

curl -s -XGET 'http://localhost:9200/logstash-2015.05.07/syslog/_search?pretty=true' -d '{
    "filter" : { "range" : { "@timestamp" : { "gte" : "now-1h" } } },
    "query" : {
        "bool" : {
            "must" : [
                {  "match_phrase" : { "message" : "Failed password for" } },
                {  "match_phrase" : { "path"    : "/var/log/secure"     } }
            ]
        }
    }
} '

Here is the start of the output from the search :

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 46,
    "max_score" : 13.308596,
    "hits" : [ {
      "_index" : "logstash-2015.05.07",
      "_type" : "syslog",
      "_id" : "AU0wzLEqqCKq_IPSp_8k",
      "_score" : 13.308596,
      "_source":{"message":"May  7 16:53:50 s_local@logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","@version":"1","@timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
    }, ...

The problem is if I change '/var/log/secure' to just 'var' say, and run the query, I still get a result, just with a lower score. I understood the bool...must construct meant both match terms here would need to be successful. What I'm after is no result if 'path' doesn't exactly match '/var/log/secure'...

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 46,
    "max_score" : 10.354593,
    "hits" : [ {
      "_index" : "logstash-2015.05.07",
      "_type" : "syslog",
      "_id" : "AU0wzLEqqCKq_IPSp_8k",
      "_score" : 10.354593,
      "_source":{"message":"May  7 16:53:50 s_local@logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","@version":"1","@timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
    },...

I checked the mappings for these fields to check that they are not analyzed :

curl -X GET 'http://localhost:9200/logstash-2015.05.07/_mapping?pretty=true'

I think these fields are non analyzed and so I believe the search will not be analyzed too (based on some training documentation I read recently from elasticsearch). Here is a snippet of the output _mapping for this index below.

      ....
      "message" : {
        "type" : "string",
        "norms" : {
          "enabled" : false
        },
        "fields" : {
          "raw" : {
            "type" : "string",
            "index" : "not_analyzed",
            "ignore_above" : 256
          }
        }
      },
      "path" : {
        "type" : "string",
        "norms" : {
          "enabled" : false
        },
        "fields" : {
          "raw" : {
            "type" : "string",
            "index" : "not_analyzed",
            "ignore_above" : 256
          }
        }
      },
      ....

Where am I going wrong (in a bunch of places I'm sure), what am I misunderstanding here (probably a lot!) ?

Any help would be much appreciated!

Thanks

--
Please update your bookmarks! We moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0470f9df-8d9a-48ef-9dbd-a90c8f2db194%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/uowU5uSn6tE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHO4itwspZ96axDfyoLavndj2wzS_%2BV-UJha%2B893F5nzp%3DZYPA%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BHzijYCLYR8EmCvfCF6Y2%2BBxqXGrzQTcYSOc4jHnYM2BQ-pAw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: How to get Elasticsearch boolean match working for multiple fields

Allan Mitchell
In reply to this post by Dominic Nicholas
Hi

Have a look at the below and see if it is what you want.

DELETE /testingindex

PUT /testingindex
{
    "settings" : {
        "number_of_shards" : 1
    },
    "mappings" : {
        "mytesttype" : {
            "_source" : { "enabled" : false },
            "properties" : {
                "message" : { "type" : "string", "index" : "analyzed" },
                "path" : {"type": "string", "index": "analyzed"
            }
        }
    }
}
}

POST /testingindex/mytesttype/1
{
    "message": "Failed password for some user or another",
    "path":"/wrong/path/"
}
POST /testingindex/mytesttype/2
{
    "message": "Not the right message but the right path",
    "path":"/var/log/secure"
}
POST /testingindex/mytesttype/3
{
    "message": "Failed password for some user or another",
    "path":"/var/log/secure"
}
POST /testingindex/mytesttype/4
{
    "message": "Nothing is right here",
    "path":"/wrong/path/too"
}


GET /testingindex/mytesttype/_search

GET /testingindex/mytesttype/_search
{
    "query": {
        "bool": {
            "must": [
             {  "match_phrase" : { "message" : "Failed password for some" } },
             {  "match_phrase" : { "path" : "/var/log/secure" } }
             
            ]
        }
    }
}

On 8 May 2015 at 02:07, Dominic Nicholas <[hidden email]> wrote:

Hi,

I need some expert guidance on trying to get a bool match working. I'd like the query to only return a successful search result if both 'message' matches 'Failed password for', and 'path' matches '/var/log/secure'.

This is my query :

curl -s -XGET 'http://localhost:9200/logstash-2015.05.07/syslog/_search?pretty=true' -d '{
    "filter" : { "range" : { "@timestamp" : { "gte" : "now-1h" } } },
    "query" : {
        "bool" : {
            "must" : [
                {  "match_phrase" : { "message" : "Failed password for" } },
                {  "match_phrase" : { "path"    : "/var/log/secure"     } }
            ]
        }
    }
} '

Here is the start of the output from the search :

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 46,
    "max_score" : 13.308596,
    "hits" : [ {
      "_index" : "logstash-2015.05.07",
      "_type" : "syslog",
      "_id" : "AU0wzLEqqCKq_IPSp_8k",
      "_score" : 13.308596,
      "_source":{"message":"May  7 16:53:50 s_local@logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","@version":"1","@timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
    }, ...

The problem is if I change '/var/log/secure' to just 'var' say, and run the query, I still get a result, just with a lower score. I understood the bool...must construct meant both match terms here would need to be successful. What I'm after is no result if 'path' doesn't exactly match '/var/log/secure'...

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 46,
    "max_score" : 10.354593,
    "hits" : [ {
      "_index" : "logstash-2015.05.07",
      "_type" : "syslog",
      "_id" : "AU0wzLEqqCKq_IPSp_8k",
      "_score" : 10.354593,
      "_source":{"message":"May  7 16:53:50 s_local@logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","@version":"1","@timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
    },...

I checked the mappings for these fields to check that they are not analyzed :

curl -X GET 'http://localhost:9200/logstash-2015.05.07/_mapping?pretty=true'

I think these fields are non analyzed and so I believe the search will not be analyzed too (based on some training documentation I read recently from elasticsearch). Here is a snippet of the output _mapping for this index below.

      ....
      "message" : {
        "type" : "string",
        "norms" : {
          "enabled" : false
        },
        "fields" : {
          "raw" : {
            "type" : "string",
            "index" : "not_analyzed",
            "ignore_above" : 256
          }
        }
      },
      "path" : {
        "type" : "string",
        "norms" : {
          "enabled" : false
        },
        "fields" : {
          "raw" : {
            "type" : "string",
            "index" : "not_analyzed",
            "ignore_above" : 256
          }
        }
      },
      ....

Where am I going wrong (in a bunch of places I'm sure), what am I misunderstanding here (probably a lot!) ?

Any help would be much appreciated!

Thanks

--
Please update your bookmarks! We moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0470f9df-8d9a-48ef-9dbd-a90c8f2db194%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAECdJzAWt8u-tNq3vGz333XTauLJN_4pJm22uLpn6O7KE%2Bbjng%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: How to get Elasticsearch boolean match working for multiple fields

Dominic Nicholas
Hi Alan, I really appreciate the thoughtful response.  One comment before I try what you are suggesting... Our path and message fields mappings indicate not_analyzed, and we don't want to change them at this point. Someone suggested using the .raw versions of the fields (path.raw and message.raw, which does work. However, it leaves me with the question : If the original field mappings indicate the fields are not_analyzed, why is it necessary to use the .raw version ?
Cheers
Dom

On Fri, May 8, 2015 at 6:37 AM, Allan Mitchell <[hidden email]> wrote:
Hi

Have a look at the below and see if it is what you want.

DELETE /testingindex

PUT /testingindex
{
    "settings" : {
        "number_of_shards" : 1
    },
    "mappings" : {
        "mytesttype" : {
            "_source" : { "enabled" : false },
            "properties" : {
                "message" : { "type" : "string", "index" : "analyzed" },
                "path" : {"type": "string", "index": "analyzed"
            }
        }
    }
}
}

POST /testingindex/mytesttype/1
{
    "message": "Failed password for some user or another",
    "path":"/wrong/path/"
}
POST /testingindex/mytesttype/2
{
    "message": "Not the right message but the right path",
    "path":"/var/log/secure"
}
POST /testingindex/mytesttype/3
{
    "message": "Failed password for some user or another",
    "path":"/var/log/secure"
}
POST /testingindex/mytesttype/4
{
    "message": "Nothing is right here",
    "path":"/wrong/path/too"
}


GET /testingindex/mytesttype/_search

GET /testingindex/mytesttype/_search
{
    "query": {
        "bool": {
            "must": [
             {  "match_phrase" : { "message" : "Failed password for some" } },
             {  "match_phrase" : { "path" : "/var/log/secure" } }
             
            ]
        }
    }
}

On 8 May 2015 at 02:07, Dominic Nicholas <[hidden email]> wrote:

Hi,

I need some expert guidance on trying to get a bool match working. I'd like the query to only return a successful search result if both 'message' matches 'Failed password for', and 'path' matches '/var/log/secure'.

This is my query :

curl -s -XGET 'http://localhost:9200/logstash-2015.05.07/syslog/_search?pretty=true' -d '{
    "filter" : { "range" : { "@timestamp" : { "gte" : "now-1h" } } },
    "query" : {
        "bool" : {
            "must" : [
                {  "match_phrase" : { "message" : "Failed password for" } },
                {  "match_phrase" : { "path"    : "/var/log/secure"     } }
            ]
        }
    }
} '

Here is the start of the output from the search :

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 46,
    "max_score" : 13.308596,
    "hits" : [ {
      "_index" : "logstash-2015.05.07",
      "_type" : "syslog",
      "_id" : "AU0wzLEqqCKq_IPSp_8k",
      "_score" : 13.308596,
      "_source":{"message":"May  7 16:53:50 s_local@logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","@version":"1","@timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
    }, ...

The problem is if I change '/var/log/secure' to just 'var' say, and run the query, I still get a result, just with a lower score. I understood the bool...must construct meant both match terms here would need to be successful. What I'm after is no result if 'path' doesn't exactly match '/var/log/secure'...

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 46,
    "max_score" : 10.354593,
    "hits" : [ {
      "_index" : "logstash-2015.05.07",
      "_type" : "syslog",
      "_id" : "AU0wzLEqqCKq_IPSp_8k",
      "_score" : 10.354593,
      "_source":{"message":"May  7 16:53:50 s_local@logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","@version":"1","@timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
    },...

I checked the mappings for these fields to check that they are not analyzed :

curl -X GET 'http://localhost:9200/logstash-2015.05.07/_mapping?pretty=true'

I think these fields are non analyzed and so I believe the search will not be analyzed too (based on some training documentation I read recently from elasticsearch). Here is a snippet of the output _mapping for this index below.

      ....
      "message" : {
        "type" : "string",
        "norms" : {
          "enabled" : false
        },
        "fields" : {
          "raw" : {
            "type" : "string",
            "index" : "not_analyzed",
            "ignore_above" : 256
          }
        }
      },
      "path" : {
        "type" : "string",
        "norms" : {
          "enabled" : false
        },
        "fields" : {
          "raw" : {
            "type" : "string",
            "index" : "not_analyzed",
            "ignore_above" : 256
          }
        }
      },
      ....

Where am I going wrong (in a bunch of places I'm sure), what am I misunderstanding here (probably a lot!) ?

Any help would be much appreciated!

Thanks

--
Please update your bookmarks! We moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0470f9df-8d9a-48ef-9dbd-a90c8f2db194%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/uowU5uSn6tE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAECdJzAWt8u-tNq3vGz333XTauLJN_4pJm22uLpn6O7KE%2Bbjng%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BHzija-TWSs47WAkjpaSTaNysXSer0a12Nza2Y5CaXi6646GQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: How to get Elasticsearch boolean match working for multiple fields

Allan Mitchell
Dominic

Normal nomenclature is that "Field" is analyzed and "Field.raw" is not analyzed.  Not sure why you would have both as not analyzed given they would do the same thing, all else being equal

When performing your original query above on fields I know are not_analyzed I get no results because there are no strings in the fields that match those terms exactly.

I could of course look to do a regex query

GET /testingindex/mytesttype/_search
{
    "query": {
        "bool": {
            "must": [
                
             {  "regexp" : { "message" : ".*Failed password for.*" } },
             {  "regexp" : { "path" : ".*/var/log/secure.*" } }
             
            ]
        }
    }
}





On 8 May 2015 at 15:03, Dominic Nicholas <[hidden email]> wrote:
Hi Alan, I really appreciate the thoughtful response.  One comment before I try what you are suggesting... Our path and message fields mappings indicate not_analyzed, and we don't want to change them at this point. Someone suggested using the .raw versions of the fields (path.raw and message.raw, which does work. However, it leaves me with the question : If the original field mappings indicate the fields are not_analyzed, why is it necessary to use the .raw version ?
Cheers
Dom

On Fri, May 8, 2015 at 6:37 AM, Allan Mitchell <[hidden email]> wrote:
Hi

Have a look at the below and see if it is what you want.

DELETE /testingindex

PUT /testingindex
{
    "settings" : {
        "number_of_shards" : 1
    },
    "mappings" : {
        "mytesttype" : {
            "_source" : { "enabled" : false },
            "properties" : {
                "message" : { "type" : "string", "index" : "analyzed" },
                "path" : {"type": "string", "index": "analyzed"
            }
        }
    }
}
}

POST /testingindex/mytesttype/1
{
    "message": "Failed password for some user or another",
    "path":"/wrong/path/"
}
POST /testingindex/mytesttype/2
{
    "message": "Not the right message but the right path",
    "path":"/var/log/secure"
}
POST /testingindex/mytesttype/3
{
    "message": "Failed password for some user or another",
    "path":"/var/log/secure"
}
POST /testingindex/mytesttype/4
{
    "message": "Nothing is right here",
    "path":"/wrong/path/too"
}


GET /testingindex/mytesttype/_search

GET /testingindex/mytesttype/_search
{
    "query": {
        "bool": {
            "must": [
             {  "match_phrase" : { "message" : "Failed password for some" } },
             {  "match_phrase" : { "path" : "/var/log/secure" } }
             
            ]
        }
    }
}

On 8 May 2015 at 02:07, Dominic Nicholas <[hidden email]> wrote:

Hi,

I need some expert guidance on trying to get a bool match working. I'd like the query to only return a successful search result if both 'message' matches 'Failed password for', and 'path' matches '/var/log/secure'.

This is my query :

curl -s -XGET 'http://localhost:9200/logstash-2015.05.07/syslog/_search?pretty=true' -d '{
    "filter" : { "range" : { "@timestamp" : { "gte" : "now-1h" } } },
    "query" : {
        "bool" : {
            "must" : [
                {  "match_phrase" : { "message" : "Failed password for" } },
                {  "match_phrase" : { "path"    : "/var/log/secure"     } }
            ]
        }
    }
} '

Here is the start of the output from the search :

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 46,
    "max_score" : 13.308596,
    "hits" : [ {
      "_index" : "logstash-2015.05.07",
      "_type" : "syslog",
      "_id" : "AU0wzLEqqCKq_IPSp_8k",
      "_score" : 13.308596,
      "_source":{"message":"May  7 16:53:50 s_local@logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","@version":"1","@timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
    }, ...

The problem is if I change '/var/log/secure' to just 'var' say, and run the query, I still get a result, just with a lower score. I understood the bool...must construct meant both match terms here would need to be successful. What I'm after is no result if 'path' doesn't exactly match '/var/log/secure'...

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 46,
    "max_score" : 10.354593,
    "hits" : [ {
      "_index" : "logstash-2015.05.07",
      "_type" : "syslog",
      "_id" : "AU0wzLEqqCKq_IPSp_8k",
      "_score" : 10.354593,
      "_source":{"message":"May  7 16:53:50 s_local@logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","@version":"1","@timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
    },...

I checked the mappings for these fields to check that they are not analyzed :

curl -X GET 'http://localhost:9200/logstash-2015.05.07/_mapping?pretty=true'

I think these fields are non analyzed and so I believe the search will not be analyzed too (based on some training documentation I read recently from elasticsearch). Here is a snippet of the output _mapping for this index below.

      ....
      "message" : {
        "type" : "string",
        "norms" : {
          "enabled" : false
        },
        "fields" : {
          "raw" : {
            "type" : "string",
            "index" : "not_analyzed",
            "ignore_above" : 256
          }
        }
      },
      "path" : {
        "type" : "string",
        "norms" : {
          "enabled" : false
        },
        "fields" : {
          "raw" : {
            "type" : "string",
            "index" : "not_analyzed",
            "ignore_above" : 256
          }
        }
      },
      ....

Where am I going wrong (in a bunch of places I'm sure), what am I misunderstanding here (probably a lot!) ?

Any help would be much appreciated!

Thanks

--
Please update your bookmarks! We moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0470f9df-8d9a-48ef-9dbd-a90c8f2db194%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/uowU5uSn6tE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAECdJzAWt8u-tNq3vGz333XTauLJN_4pJm22uLpn6O7KE%2Bbjng%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BHzija-TWSs47WAkjpaSTaNysXSer0a12Nza2Y5CaXi6646GQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAECdJzDvkBP7a8pqjKHoF6wKrrTqdCd0a%3DCTU4inJnuM3FCxxg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: How to get Elasticsearch boolean match working for multiple fields

Dominic Nicholas
Hi - thanks again - I was misunderstanding the following :

"path" : {
        "type" : "string",
        "norms" : {
          "enabled" : false
        },
        "fields" : {
          "raw" : {
            "type" : "string",
            "index" : "not_analyzed",
            "ignore_above" : 256
          }
        }
      }

This is saying that the path is analyzed (default analyzer, and no 'index: not_analyzed'), but that the field 'raw' is not analyzed. One solution for me will be to simply use the path.raw field instead of the path field. I'll also try the regexp. Thanks again for the help!
Dom

On Fri, May 8, 2015 at 10:35 AM, Allan Mitchell <[hidden email]> wrote:
Dominic

Normal nomenclature is that "Field" is analyzed and "Field.raw" is not analyzed.  Not sure why you would have both as not analyzed given they would do the same thing, all else being equal

When performing your original query above on fields I know are not_analyzed I get no results because there are no strings in the fields that match those terms exactly.

I could of course look to do a regex query

GET /testingindex/mytesttype/_search
{
    "query": {
        "bool": {
            "must": [
                
             {  "regexp" : { "message" : ".*Failed password for.*" } },
             {  "regexp" : { "path" : ".*/var/log/secure.*" } }
             
            ]
        }
    }
}





On 8 May 2015 at 15:03, Dominic Nicholas <[hidden email]> wrote:
Hi Alan, I really appreciate the thoughtful response.  One comment before I try what you are suggesting... Our path and message fields mappings indicate not_analyzed, and we don't want to change them at this point. Someone suggested using the .raw versions of the fields (path.raw and message.raw, which does work. However, it leaves me with the question : If the original field mappings indicate the fields are not_analyzed, why is it necessary to use the .raw version ?
Cheers
Dom

On Fri, May 8, 2015 at 6:37 AM, Allan Mitchell <[hidden email]> wrote:
Hi

Have a look at the below and see if it is what you want.

DELETE /testingindex

PUT /testingindex
{
    "settings" : {
        "number_of_shards" : 1
    },
    "mappings" : {
        "mytesttype" : {
            "_source" : { "enabled" : false },
            "properties" : {
                "message" : { "type" : "string", "index" : "analyzed" },
                "path" : {"type": "string", "index": "analyzed"
            }
        }
    }
}
}

POST /testingindex/mytesttype/1
{
    "message": "Failed password for some user or another",
    "path":"/wrong/path/"
}
POST /testingindex/mytesttype/2
{
    "message": "Not the right message but the right path",
    "path":"/var/log/secure"
}
POST /testingindex/mytesttype/3
{
    "message": "Failed password for some user or another",
    "path":"/var/log/secure"
}
POST /testingindex/mytesttype/4
{
    "message": "Nothing is right here",
    "path":"/wrong/path/too"
}


GET /testingindex/mytesttype/_search

GET /testingindex/mytesttype/_search
{
    "query": {
        "bool": {
            "must": [
             {  "match_phrase" : { "message" : "Failed password for some" } },
             {  "match_phrase" : { "path" : "/var/log/secure" } }
             
            ]
        }
    }
}

On 8 May 2015 at 02:07, Dominic Nicholas <[hidden email]> wrote:

Hi,

I need some expert guidance on trying to get a bool match working. I'd like the query to only return a successful search result if both 'message' matches 'Failed password for', and 'path' matches '/var/log/secure'.

This is my query :

curl -s -XGET 'http://localhost:9200/logstash-2015.05.07/syslog/_search?pretty=true' -d '{
    "filter" : { "range" : { "@timestamp" : { "gte" : "now-1h" } } },
    "query" : {
        "bool" : {
            "must" : [
                {  "match_phrase" : { "message" : "Failed password for" } },
                {  "match_phrase" : { "path"    : "/var/log/secure"     } }
            ]
        }
    }
} '

Here is the start of the output from the search :

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 46,
    "max_score" : 13.308596,
    "hits" : [ {
      "_index" : "logstash-2015.05.07",
      "_type" : "syslog",
      "_id" : "AU0wzLEqqCKq_IPSp_8k",
      "_score" : 13.308596,
      "_source":{"message":"May  7 16:53:50 s_local@logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","@version":"1","@timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
    }, ...

The problem is if I change '/var/log/secure' to just 'var' say, and run the query, I still get a result, just with a lower score. I understood the bool...must construct meant both match terms here would need to be successful. What I'm after is no result if 'path' doesn't exactly match '/var/log/secure'...

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 46,
    "max_score" : 10.354593,
    "hits" : [ {
      "_index" : "logstash-2015.05.07",
      "_type" : "syslog",
      "_id" : "AU0wzLEqqCKq_IPSp_8k",
      "_score" : 10.354593,
      "_source":{"message":"May  7 16:53:50 s_local@logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","@version":"1","@timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
    },...

I checked the mappings for these fields to check that they are not analyzed :

curl -X GET 'http://localhost:9200/logstash-2015.05.07/_mapping?pretty=true'

I think these fields are non analyzed and so I believe the search will not be analyzed too (based on some training documentation I read recently from elasticsearch). Here is a snippet of the output _mapping for this index below.

      ....
      "message" : {
        "type" : "string",
        "norms" : {
          "enabled" : false
        },
        "fields" : {
          "raw" : {
            "type" : "string",
            "index" : "not_analyzed",
            "ignore_above" : 256
          }
        }
      },
      "path" : {
        "type" : "string",
        "norms" : {
          "enabled" : false
        },
        "fields" : {
          "raw" : {
            "type" : "string",
            "index" : "not_analyzed",
            "ignore_above" : 256
          }
        }
      },
      ....

Where am I going wrong (in a bunch of places I'm sure), what am I misunderstanding here (probably a lot!) ?

Any help would be much appreciated!

Thanks

--
Please update your bookmarks! We moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0470f9df-8d9a-48ef-9dbd-a90c8f2db194%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/uowU5uSn6tE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAECdJzAWt8u-tNq3vGz333XTauLJN_4pJm22uLpn6O7KE%2Bbjng%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BHzija-TWSs47WAkjpaSTaNysXSer0a12Nza2Y5CaXi6646GQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/uowU5uSn6tE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAECdJzDvkBP7a8pqjKHoF6wKrrTqdCd0a%3DCTU4inJnuM3FCxxg%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BHzijY0jNUjrdkyjqqOWRA2RNf0vdKMSuMsXMb4eTdwDAXZfA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.