Wildcard analyze does not work for "the"

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Wildcard analyze does not work for "the"

Maciej Wiercinski
Hi, 

I'm struggling to get a wildcard query running searching for a string "The Times". As far as I understand the tokenizer should remove "The" as a stop word while indexing the field, however it does not seem to get applied to the wildcard, regardless of "analyze_wildcard" setting. I've tried changing the mapping on "name" field to not_analyzed, however it didn't help. 

Should I report it as a bug, or am I missing something? 

Full example:

$ curl -XDELETE 127.0.0.1:9200/test_index?pretty
{
  "ok" : true,
  "acknowledged" : true
}

$ curl -XPUT 127.0.0.1:9200/test_index/test_type/1?pretty -d '{ "name": "The Times" }';
{
  "ok" : true,
  "_index" : "test_index",
  "_type" : "test_type",
  "_id" : "1",
  "_version" : 1
}


$ curl -XGET 127.0.0.1:9200/test_index/test_type/_search?pretty -d '{"query":{"query_string":{"query":"the times*","default_operator":"AND", "analyze_wildcard": "true" }}}' 
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "test_index",
      "_type" : "test_type",
      "_id" : "1",
      "_score" : 1.0, "_source" : {
   "name": "The Times"
}
    } ]
  }
}


$ curl -XGET 127.0.0.1:9200/test_index/test_type/_search?pretty -d '{"query":{"query_string":{"query":"the* times*","default_operator":"AND", "analyze_wildcard": "true" }}}' 
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

$ curl -XGET 127.0.0.1:9200/test_index/test_type/_search?pretty -d '{"query":{"query_string":{"query":"the times*","default_operator":"AND", "analyze_wildcard": "false" }}}' 
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "test_index",
      "_type" : "test_type",
      "_id" : "1",
      "_score" : 1.0, "_source" : {
   "name": "The Times"
}
    } ]
  }
}

$ curl -XGET 127.0.0.1:9200/test_index/test_type/_search?pretty -d '{"query":{"query_string":{"query":"the* times*","default_operator":"AND", "analyze_wildcard": "false" }}}' 
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

Kind regards,
Maciej Wiercinski
Reply | Threaded
Open this post in threaded view
|

Re: Wildcard analyze does not work for "the"

Clinton Gormley-2
Hi Maciej

'the' is a stopword, and so is being removed.  You can disable stopwords
with a custom analyzer

clint

>
>
> I'm struggling to get a wildcard query running searching for a string
> "The Times". As far as I understand the tokenizer should remove "The"
> as a stop word while indexing the field, however it does not seem to
> get applied to the wildcard, regardless of "analyze_wildcard" setting.
> I've tried changing the mapping on "name" field to not_analyzed,
> however it didn't help.
>
>
> Should I report it as a bug, or am I missing something?
>
>
> Full example:
>
>
>
> $ curl -XDELETE 127.0.0.1:9200/test_index?pretty
> {
>   "ok" : true,
>   "acknowledged" : true
> }
>
>
> $ curl -XPUT 127.0.0.1:9200/test_index/test_type/1?pretty -d
> '{ "name": "The Times" }';
> {
>   "ok" : true,
>   "_index" : "test_index",
>   "_type" : "test_type",
>   "_id" : "1",
>   "_version" : 1
> }
>
>
>
>
> $ curl -XGET 127.0.0.1:9200/test_index/test_type/_search?pretty -d
> '{"query":{"query_string":{"query":"the
> times*","default_operator":"AND", "analyze_wildcard": "true" }}}'
> {
>   "took" : 1,
>   "timed_out" : false,
>   "_shards" : {
>     "total" : 5,
>     "successful" : 5,
>     "failed" : 0
>   },
>   "hits" : {
>     "total" : 1,
>     "max_score" : 1.0,
>     "hits" : [ {
>       "_index" : "test_index",
>       "_type" : "test_type",
>       "_id" : "1",
>       "_score" : 1.0, "_source" : {
>    "name": "The Times"
> }
>     } ]
>   }
> }
>
>
>
>
> $ curl -XGET 127.0.0.1:9200/test_index/test_type/_search?pretty -d
> '{"query":{"query_string":{"query":"the*
> times*","default_operator":"AND", "analyze_wildcard": "true" }}}'
> {
>   "took" : 1,
>   "timed_out" : false,
>   "_shards" : {
>     "total" : 5,
>     "successful" : 5,
>     "failed" : 0
>   },
>   "hits" : {
>     "total" : 0,
>     "max_score" : null,
>     "hits" : [ ]
>   }
> }
>
>
> $ curl -XGET 127.0.0.1:9200/test_index/test_type/_search?pretty -d
> '{"query":{"query_string":{"query":"the
> times*","default_operator":"AND", "analyze_wildcard": "false" }}}'
> {
>   "took" : 0,
>   "timed_out" : false,
>   "_shards" : {
>     "total" : 5,
>     "successful" : 5,
>     "failed" : 0
>   },
>   "hits" : {
>     "total" : 1,
>     "max_score" : 1.0,
>     "hits" : [ {
>       "_index" : "test_index",
>       "_type" : "test_type",
>       "_id" : "1",
>       "_score" : 1.0, "_source" : {
>    "name": "The Times"
> }
>     } ]
>   }
> }
>
>
> $ curl -XGET 127.0.0.1:9200/test_index/test_type/_search?pretty -d
> '{"query":{"query_string":{"query":"the*
> times*","default_operator":"AND", "analyze_wildcard": "false" }}}'
> {
>   "took" : 0,
>   "timed_out" : false,
>   "_shards" : {
>     "total" : 5,
>     "successful" : 5,
>     "failed" : 0
>   },
>   "hits" : {
>     "total" : 0,
>     "max_score" : null,
>     "hits" : [ ]
>   }
> }
>
>
> Kind regards,
> Maciej Wiercinski


Reply | Threaded
Open this post in threaded view
|

Re: Wildcard analyze does not work for "the"

Maciej Wiercinski
Hi Clinton

I do understand that "the" is a stopword, however I still reckon it's
a bug. If "the" is being removed from the search query in form "the
AND times*" and the search yields positive results, then "the* AND
times*" should also be the case - wildcard_analyze should remove the*
part and make the  search equivalent to "times*".

Any thoughts?

Kinds regards,
Maciej

On Aug 19, 9:56 am, Clinton Gormley <[hidden email]> wrote:

> Hi Maciej
>
> 'the' is a stopword, and so is being removed.  You can disable stopwords
> with a custom analyzer
>
> clint
>
>
>
>
>
>
>
>
>
> > I'm struggling to get a wildcard query running searching for a string
> > "The Times". As far as I understand the tokenizer should remove "The"
> > as a stop word while indexing the field, however it does not seem to
> > get applied to the wildcard, regardless of "analyze_wildcard" setting.
> > I've tried changing the mapping on "name" field to not_analyzed,
> > however it didn't help.
>
> > Should I report it as a bug, or am I missing something?
>
> > Full example:
>
> > $ curl -XDELETE 127.0.0.1:9200/test_index?pretty
> > {
> >   "ok" : true,
> >   "acknowledged" : true
> > }
>
> > $ curl -XPUT 127.0.0.1:9200/test_index/test_type/1?pretty -d
> > '{ "name": "The Times" }';
> > {
> >   "ok" : true,
> >   "_index" : "test_index",
> >   "_type" : "test_type",
> >   "_id" : "1",
> >   "_version" : 1
> > }
>
> > $ curl -XGET 127.0.0.1:9200/test_index/test_type/_search?pretty -d
> > '{"query":{"query_string":{"query":"the
> > times*","default_operator":"AND", "analyze_wildcard": "true" }}}'
> > {
> >   "took" : 1,
> >   "timed_out" : false,
> >   "_shards" : {
> >     "total" : 5,
> >     "successful" : 5,
> >     "failed" : 0
> >   },
> >   "hits" : {
> >     "total" : 1,
> >     "max_score" : 1.0,
> >     "hits" : [ {
> >       "_index" : "test_index",
> >       "_type" : "test_type",
> >       "_id" : "1",
> >       "_score" : 1.0, "_source" : {
> >    "name": "The Times"
> > }
> >     } ]
> >   }
> > }
>
> > $ curl -XGET 127.0.0.1:9200/test_index/test_type/_search?pretty -d
> > '{"query":{"query_string":{"query":"the*
> > times*","default_operator":"AND", "analyze_wildcard": "true" }}}'
> > {
> >   "took" : 1,
> >   "timed_out" : false,
> >   "_shards" : {
> >     "total" : 5,
> >     "successful" : 5,
> >     "failed" : 0
> >   },
> >   "hits" : {
> >     "total" : 0,
> >     "max_score" : null,
> >     "hits" : [ ]
> >   }
> > }
>
> > $ curl -XGET 127.0.0.1:9200/test_index/test_type/_search?pretty -d
> > '{"query":{"query_string":{"query":"the
> > times*","default_operator":"AND", "analyze_wildcard": "false" }}}'
> > {
> >   "took" : 0,
> >   "timed_out" : false,
> >   "_shards" : {
> >     "total" : 5,
> >     "successful" : 5,
> >     "failed" : 0
> >   },
> >   "hits" : {
> >     "total" : 1,
> >     "max_score" : 1.0,
> >     "hits" : [ {
> >       "_index" : "test_index",
> >       "_type" : "test_type",
> >       "_id" : "1",
> >       "_score" : 1.0, "_source" : {
> >    "name": "The Times"
> > }
> >     } ]
> >   }
> > }
>
> > $ curl -XGET 127.0.0.1:9200/test_index/test_type/_search?pretty -d
> > '{"query":{"query_string":{"query":"the*
> > times*","default_operator":"AND", "analyze_wildcard": "false" }}}'
> > {
> >   "took" : 0,
> >   "timed_out" : false,
> >   "_shards" : {
> >     "total" : 5,
> >     "successful" : 5,
> >     "failed" : 0
> >   },
> >   "hits" : {
> >     "total" : 0,
> >     "max_score" : null,
> >     "hits" : [ ]
> >   }
> > }
>
> > Kind regards,
> > Maciej Wiercinski