facet counts dont match with real values

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

facet counts dont match with real values

Tania
Hi,
I am working with elasticsearch and faceted search. It worked great in
my first days,  but after using it multiple times and testing various
cases, I am observing that not always the count value returned by the
es server matches with the expected value and I would like to know
whether its my fault because I am not using it in the proper way.
Consider the following example:
I define an analyzer based on semicolon to extract each of the terms
for faceting:
curl -XPOST http://localhost:9200/test/ -d '{
  {"settings" : {"analysis" : {"analyzer" : {"semicolon" : {"type" :
"pattern", "pattern": ";"}}}},
   "mappings" : {"news" : {"properties" : {"tags_an" : {"type" :
"string", "analyzer": "semicolon"}}}}}
}'

curl -XGET http://localhost:9200/test/_search?pretty=true -d '{
 "query": {
   "query_string" :{
     "fields" : ["title", "description", "tags"],
     "query": "xxx"
   }
 },
 "facets": {
   "tags": {
     "terms": {
       "field" : "tags_an"
     }
   }
 }
}'

All the facets returned by the es server are presented to the user to
help her in the following search to narrow the results.
Imagine the results returned by the server to the previous query:
{
 "took" : 6,
 "timed_out" : false,
 "_shards" : {
   "total" : 5,
   "successful" : 5,
   "failed" : 0
 },
 "hits" : {
    "total":20
     ...
 },
 "facets" : {
   "tags" : {
     "_type" : "terms",
     "missing" : 15,
     "terms" : [ {
       "term" : "innovation",
       "count" : 10
     }, {
       "term" : "open governement",
       "count" : 4
     } {
       "term" : "science",
       "count" : 2
     },]
   }
 }
}

And the user is interested in the "open government" facet. So she
clicks in it and a new request is generated to the es server:
curl -XGET http://localhost:9200/test/_search?pretty=true -d '{
 "query": {
   "query_string" :{
     "fields" : ["title", "description", "tags"],
     "query": "xxx AND tags:open government"
   }
 },
 "facets": {
   "tags": {
     "terms": {
       "field" : "tags_an"
     }
   }
 }
}'
But now, surprisingly, the number of hits returned is not 4, as
expected, but 6!!
{
 "took" : 5,
 "timed_out" : false,
 "_shards" : {
   "total" : 6,
   "successful" : 5,
   "failed" : 0
 },
 "hits" : {
   "total":6
...
 },
 "facets" : {
   "tags" : {
     "_type" : "terms",
     "missing" : 15,
     "terms" : [
      {
       "term" : "open governement",
       "count" : 5
     } {
       "term" : "science",
       "count" : 2
     },]
   }
 }
}


In many cases, the returned result matches with the expected value,
but when the new requested value contains spaces or special characters
the result is not always correct. Am I making an error in the query
string? should I escape whitespaces? I have used faceted search in
other projects but I havent appreciated this behaviour anywhere.
Please, any help will be appreciated!
Thanks in advance!

Reply | Threaded
Open this post in threaded view
|

Re: facet counts dont match with real values

Clinton Gormley-2

Hi Tania

> And the user is interested in the "open government" facet. So she
> clicks in it and a new request is generated to the es server:
> curl -XGET http://localhost:9200/test/_search?pretty=true -d '{
>  "query": {
>    "query_string" :{
>      "fields" : ["title", "description", "tags"],
>      "query": "xxx AND tags:open government"

This is actually a query for :

        "xxx" AND "tags:open" OR "government"

You could change it to:

     "query": "xxx AND tags:\"open goverment\""

Or alternatively:

curl -XGET http://localhost:9200/test/_search?pretty=true -d '{
   "query" : {
      "filtered" : {
         "filter" : {
            "term" : {
               "tags" : "open goverment"
            }
         },
         "query" : {
            "query_string" : {
               "fields" : [ "title","description" ],
               "query" : "xxx"
            }
         }
      }
   }
}
'

clint

Reply | Threaded
Open this post in threaded view
|

Re: facet counts dont match with real values

kimchy
Administrator
Adding on what clinton said, filteres are the much preferred way to go about doing it.

On Tue, Aug 2, 2011 at 1:47 PM, Clinton Gormley <[hidden email]> wrote:

Hi Tania

> And the user is interested in the "open government" facet. So she
> clicks in it and a new request is generated to the es server:
> curl -XGET http://localhost:9200/test/_search?pretty=true -d '{
>  "query": {
>    "query_string" :{
>      "fields" : ["title", "description", "tags"],
>      "query": "xxx AND tags:open government"

This is actually a query for :

       "xxx" AND "tags:open" OR "government"

You could change it to:

    "query": "xxx AND tags:\"open goverment\""

Or alternatively:

curl -XGET http://localhost:9200/test/_search?pretty=true -d '{
  "query" : {
     "filtered" : {
        "filter" : {
           "term" : {
              "tags" : "open goverment"
           }
        },
        "query" : {
           "query_string" : {
              "fields" : [ "title","description" ],
              "query" : "xxx"
           }
        }
     }
  }
}
'

clint


Reply | Threaded
Open this post in threaded view
|

Re: facet counts dont match with real values

Tania
Thanks a lot! I knew it had to be my fault... elastic search never fails! :-)
Reply | Threaded
Open this post in threaded view
|

Re: facet counts dont match with real values

Tania
In reply to this post by kimchy
Hi again!
Considering your recommendations about using filters instead of
building large and error sensitive queries, I have tried using
filters. Everything went ok (fantastic documentation!) but oh dear! ,
I suspect that facets are calculated on all the matches, this is the
filtered ones are not discarded. Is this ok?
I need to extract facets for all my searches, so even if filtering is
the appropriate solution I think that in my case it doesn't fit. Or is
there another type that I could use to combine filters and facets (and
obtain facets only for the filtered results)?
Thanks!

On 2 ago, 13:11, Shay Banon <[hidden email]> wrote:

> Adding on what clinton said, filteres are the much preferred way to go about
> doing it.
>
> On Tue, Aug 2, 2011 at 1:47 PM, Clinton Gormley <[hidden email]>wrote:
>
>
>
>
>
>
>
>
>
> > Hi Tania
>
> > > And the user is interested in the "open government" facet. So she
> > > clicks in it and a new request is generated to the es server:
> > > curl -XGEThttp://localhost:9200/test/_search?pretty=true-d '{
> > >  "query": {
> > >    "query_string" :{
> > >      "fields" : ["title", "description", "tags"],
> > >      "query": "xxx AND tags:open government"
>
> > This is actually a query for :
>
> >        "xxx" AND "tags:open" OR "government"
>
> > You could change it to:
>
> >     "query": "xxx AND tags:\"open goverment\""
>
> > Or alternatively:
>
> > curl -XGEThttp://localhost:9200/test/_search?pretty=true-d '{
> >   "query" : {
> >       "filtered" : {
> >         "filter" : {
> >            "term" : {
> >               "tags" : "open goverment"
> >            }
> >         },
> >         "query" : {
> >            "query_string" : {
> >               "fields" : [ "title","description" ],
> >               "query" : "xxx"
> >            }
> >         }
> >      }
> >   }
> > }
> > '
>
> > clint
Reply | Threaded
Open this post in threaded view
|

Re: facet counts dont match with real values

Ivan Brusic
 Not sure if I completely understand your question/scenario, but you might be looking for facet filters: http://www.elasticsearch.org/guide/reference/api/search/facets/filter-facet.html

Facet filters will "discard" the filtered matches from the facets.

-- 
Ivan

On Tue, Aug 2, 2011 at 12:39 PM, tania <[hidden email]> wrote:
Hi again!
Considering your recommendations about using filters instead of
building large and error sensitive queries, I have tried using
filters. Everything went ok (fantastic documentation!) but oh dear! ,
I suspect that facets are calculated on all the matches, this is the
filtered ones are not discarded. Is this ok?
I need to extract facets for all my searches, so even if filtering is
the appropriate solution I think that in my case it doesn't fit. Or is
there another type that I could use to combine filters and facets (and
obtain facets only for the filtered results)?
Thanks!

Reply | Threaded
Open this post in threaded view
|

Re: facet counts dont match with real values

Clinton Gormley-2
In reply to this post by Tania
Hi Tania

> Considering your recommendations about using filters instead of
> building large and error sensitive queries, I have tried using
> filters. Everything went ok (fantastic documentation!) but oh dear! ,
> I suspect that facets are calculated on all the matches, this is the
> filtered ones are not discarded. Is this ok?
> I need to extract facets for all my searches, so even if filtering is
> the appropriate solution I think that in my case it doesn't fit. Or is
> there another type that I could use to combine filters and facets (and
> obtain facets only for the filtered results)?

I think it is likely that you are doing something wrong.

Please gist (http://gist.github.com/gists ) an example of what you are
doing, the results you are getting, and the results you would like to
get.  

clint


Reply | Threaded
Open this post in threaded view
|

Re: facet counts dont match with real values

kimchy
Administrator
One option, if I understood the question, if hte fact that a filtered query will cause facets to be computed on the filtered result set? If you don't want this behavior, then use the top level filter element: http://www.elasticsearch.org/guide/reference/api/search/filter.html.

On Tue, Aug 2, 2011 at 7:48 PM, Clinton Gormley <[hidden email]> wrote:
Hi Tania

> Considering your recommendations about using filters instead of
> building large and error sensitive queries, I have tried using
> filters. Everything went ok (fantastic documentation!) but oh dear! ,
> I suspect that facets are calculated on all the matches, this is the
> filtered ones are not discarded. Is this ok?
> I need to extract facets for all my searches, so even if filtering is
> the appropriate solution I think that in my case it doesn't fit. Or is
> there another type that I could use to combine filters and facets (and
> obtain facets only for the filtered results)?

I think it is likely that you are doing something wrong.

Please gist (http://gist.github.com/gists ) an example of what you are
doing, the results you are getting, and the results you would like to
get.

clint



Reply | Threaded
Open this post in threaded view
|

Re: facet counts dont match with real values

Tania
First of all, yesterday I didn't understand very well what you
proposed me about using filters, and I was implementing
http://www.elasticsearch.org/guide/reference/api/search/filter.html,
while what interests me is filtered query
http://www.elasticsearch.org/guide/reference/query-dsl/filtered-query.html.

Now I have reimplemented it with filtered query and I think that while
creating the gist I have discovered what is happening, but still not
understand.

Here is a little example with all the responses: https://gist.github.com/1122262

(1) Because I want to obtain tag facets for each search and tags could
be longer than two words, I need to create an analyzer to calculate
facets for the whole tag based on a delimiter (in this case
semicolon).
So tags are indexed twice: one 'normal' and another with an analyzer.

So as to help the user to find the appropriate result, I present all
the facets obtained in the previous search.The user can choose
whatever facet and then a new request is generated to the es server to
recalculate the results, now taking into consideration the first query
and the selected facet. Imagine the user clicks on "West Asia". (steps
4, 5 and 6)

(4) in step 4 of the gist I show how I was doing this faceted search
before talking to you (what you do not recommend)

(5) in step 5 I try to do it with filtered query (notice that I use an
'and filter' because the user could continue clicking on more facets
and narrowing the search), but I apply it to the normal field (not
analyzed). Results are null!! why is this happening?

(6) I repeat the same search but filtering on the analyzed
field(tags_analyzed). Now the results are what I expected!

I think that I have found the solution that my app needs, but I dont
understand why this is happening.

I need to store tags twice, otherwise facets are not calculated
properly, ans if I definitely apply this solution I would need to
store tags fields three times (because my app is in Spanish and I have
to store them with tildes and without them!)


Is this all correct? What could be improved? Why elastic search
behaves so differently in analyzed fields and in not analyzed?

Thanks, thanks, thanks!







On 2 ago, 20:21, Shay Banon <[hidden email]> wrote:

> One option, if I understood the question, if hte fact that a filtered query
> will cause facets to be computed on the filtered result set? If you don't
> want this behavior, then use the top level filter element:http://www.elasticsearch.org/guide/reference/api/search/filter.html.
>
> On Tue, Aug 2, 2011 at 7:48 PM, Clinton Gormley <[hidden email]>wrote:
>
>
>
>
>
>
>
> > Hi Tania
>
> > > Considering your recommendations about using filters instead of
> > > building large and error sensitive queries, I have tried using
> > > filters. Everything went ok (fantastic documentation!) but oh dear! ,
> > > I suspect that facets are calculated on all the matches, this is the
> > > filtered ones are not discarded. Is this ok?
> > > I need to extract facets for all my searches, so even if filtering is
> > > the appropriate solution I think that in my case it doesn't fit. Or is
> > > there another type that I could use to combine filters and facets (and
> > > obtain facets only for the filtered results)?
>
> > I think it is likely that you are doing something wrong.
>
> > Please gist (http://gist.github.com/gists) an example of what you are
> > doing, the results you are getting, and the results you would like to
> > get.
>
> > clint