Problem: Facets tokenize tags with spaces. Is there a solution?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Problem: Facets tokenize tags with spaces. Is there a solution?

Royce
Hi,

I'm using facets to do filters on search results. One tag, for
example, is a city name "Kansas City." The facet interprets "Kansas
City" as two separate counts, "Kansas" and "City".

How can I configure facets to recognize "Kansas City" as one tag?


Take care,

Royce
Reply | Threaded
Open this post in threaded view
|

Re: Problem: Facets tokenize tags with spaces. Is there a solution?

Royce
Similar topics:

http://groups.google.com/group/elasticsearch/browse_thread/thread/ec24d56db34275b1/ccdaa025a3bb2481?lnk=gst&q=facet+tokenize#ccdaa025a3bb2481

On Jan 11, 12:01 pm, Royce <[hidden email]> wrote:

> Hi,
>
> I'm using facets to do filters on search results. One tag, for
> example, is a city name "Kansas City." The facet interprets "Kansas
> City" as two separate counts, "Kansas" and "City".
>
> How can I configure facets to recognize "Kansas City" as one tag?
>
> Take care,
>
> Royce
Reply | Threaded
Open this post in threaded view
|

Re: Problem: Facets tokenize tags with spaces. Is there a solution?

Royce
In reply to this post by Royce
Let me be clear, that my tag isn't "Kansas City". Instead, it's
"location" and one of the cities that shows up is "Kansas City."



On Jan 11, 12:01 pm, Royce <[hidden email]> wrote:

> Hi,
>
> I'm using facets to do filters on search results. One tag, for
> example, is a city name "Kansas City." The facet interprets "Kansas
> City" as two separate counts, "Kansas" and "City".
>
> How can I configure facets to recognize "Kansas City" as one tag?
>
> Take care,
>
> Royce
Reply | Threaded
Open this post in threaded view
|

Re: Problem: Facets tokenize tags with spaces. Is there a solution?

Ivan Brusic
In reply to this post by Royce
The topic you referenced has the answer. If the field you are faceting
on is a string, then it needs to  either be not analyzed or analyzed
with something like the KeywordAnalyzer which terms the term as a
single token. Can you gist the mapping you are using? In your example,
it appears that location is being analyzed and is indexed as two
tokens "Kansas" and "City", which is the default behavior. The facet
will treat the two tokens as unique terms.

http://www.elasticsearch.org/guide/reference/index-modules/analysis/keyword-analyzer.html

--
Ivan

http://www.elasticsearch.org/guide/reference/api/search/facets/

On Wed, Jan 11, 2012 at 10:01 AM, Royce <[hidden email]> wrote:

> Hi,
>
> I'm using facets to do filters on search results. One tag, for
> example, is a city name "Kansas City." The facet interprets "Kansas
> City" as two separate counts, "Kansas" and "City".
>
> How can I configure facets to recognize "Kansas City" as one tag?
>
>
> Take care,
>
> Royce
Reply | Threaded
Open this post in threaded view
|

Re: Problem: Facets tokenize tags with spaces. Is there a solution?

Suraj
Hi,

I have the same problem as with Royce. I have the following mappings:

curl -XPOST "http://localhost:9200/pictures" -d '
{
    "mappings" : {
        "pictures" : {
            "properties" : {
                "id": { "type": "string" },
                "description": {"type": "string", "index": "not_analyzed"},
                "featured": { "type": "boolean" },
                "categories": { "type": "string", "index": "not_analyzed" },
                "tags": { "type": "string", "index": "not_analyzed", "analyzer": "keyword" },
                "created_at": { "type": "double" }
            }
        }
    }
}'

And My Data is:

curl -X POST "http://localhost:9200/pictures/picture" -d '{
  "picture": {
    "id": "4defe0ecf02a8724b8000047",
    "title": "Victoria Secret PhotoShoot",
    "description": "From France and Italy",
    "featured": true,
    "categories": [
      "Fashion",
      "Girls",
    ],
    "tags": [
      "girl",
      "photoshoot",
      "supermodel",
      "Victoria Secret"
    ],
    "created_at": 1405784416.04672
  }
}'

And My Query is:
curl -X POST "http://localhost:9200/pictures/_search?pretty=true" -d '
{
  "query": {
    "text": {
      "tags": {
        "query": "Victoria Secret"
      }
    }
  },
  "facets": {
    "tags": {
      "terms": {
        "field": "tags"
      }
    }
  }
}'

The Output result is:
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  },
  "facets" : {
    "tags" : {
      "_type" : "terms",
      "missing" : 0,
      "total" : 0,
      "other" : 0,
      "terms" : [ ]
    }
  }
}

So I got total 0 in facets and total: 0 in hits

Any Idea Why its not working? I know that when remove the keyword analyzer from tags and make it "not_analyzed" then I get result but there is still a problem of case insensitive.

Cheers!
Suraj


On Thursday, January 12, 2012 7:39:24 AM UTC+5:45, Ivan Brusic wrote:
The topic you referenced has the answer. If the field you are faceting
on is a string, then it needs to  either be not analyzed or analyzed
with something like the KeywordAnalyzer which terms the term as a
single token. Can you gist the mapping you are using? In your example,
it appears that location is being analyzed and is indexed as two
tokens "Kansas" and "City", which is the default behavior. The facet
will treat the two tokens as unique terms.

http://www.elasticsearch.org/guide/reference/index-modules/analysis/keyword-analyzer.html

--
Ivan

http://www.elasticsearch.org/guide/reference/api/search/facets/

On Wed, Jan 11, 2012 at 10:01 AM, Royce <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="4XDxsnMhR8kJ">royce....@...> wrote:


> Hi,
>
> I'm using facets to do filters on search results. One tag, for
> example, is a city name "Kansas City." The facet interprets "Kansas
> City" as two separate counts, "Kansas" and "City".
>
> How can I configure facets to recognize "Kansas City" as one tag?
>
>
> Take care,
>
> Royce

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Problem: Facets tokenize tags with spaces. Is there a solution?

mohammad
In reply to this post by Royce
Hello everyone,
well i am new to elastic search and i am facing some similar difficulties as mentioned above. i tried implementing some of the suggested solution but to no avail.
I am posting part of codes and will be very grateful if somebody could help me out. Thanks in advance.

the codes are written in java:
// i have the following in the mapping part
CreateIndexRequestBuilder builder = client.admin().indices().prepareCreate(index)
                .setSettings(ImmutableSettings.settingsBuilder().loadFromSource(configIndex));
               
                builder.addMapping("StatTest",  "{\n" +
                " \"StatTest\" : {\n" +
                " \"_all\" : { \n" +
                " \"analyzer\":\"francais\" \n" +
                " },\n" +
                " \"properties\" : {\n" +
                " \"idUser\" : {\"type\" : \"string\", \"analyzer\":\"francais\"},\n" +
                " \"loginOfUser\" : {\"type\" : \"string\", \"analyzer\":\"francais\"},\n" +
                " \"nameOfUser\" : {\"type\" : \"string\", \"analyzer\":\"francais\"},\n" +
                " }\n" +
                " }\n" +
                "}");

//the sample data stored are the following
{idUser: "0121", loginOfUser: "login0121", nameOfUser :"mona lisa"},
{idUser: "0122", loginOfUser: "login0122", nameOfUser :"James Dean"},

//i am trying to get facets based upon name of user
                        //TermsFacetBuilder fb = FacetBuilders.termsFacet("idOfUser").field("loginOfUser");
                        TermsFacetBuilder fb = FacetBuilders.termsFacet("idOfUser").field("nameOfUser");
                        SearchRequestBuilder srb1 = client.prepareSearch().setIndices(index).addFacet(fb);
                        AndFilterBuilder myFilters = FilterBuilders.andFilter();
                                myFilters.add(FilterBuilders.termFilter("year", "2014"));
                        FilterBuilder fbBuilder = FilterBuilders.andFilter(myFilters);
                        FilteredQueryBuilder q = QueryBuilders.filteredQuery(QueryBuilders.matchAllQuery(),fbBuilder);
                        SearchResponse sr = srb1.setQuery(q).execute().actionGet();
                        TermsFacet f = (TermsFacet) sr.getFacets().facetsAsMap().get("idOfUser");
                        for (TermsFacet.Entry entry : f) {
                                String type = entry.getTerm().toString();
                                //System.out.println("....enter type : "+type);
                                //System.out.println("....enter entry.getCount() : "+entry.getCount());
                               
                        }


//problems faced whenever i am trying to do a facet based on login of user,
everything works well
the variable type  returns :
login0121
login0122

however when i try to do a facet based on nameOfUser , the following is returned:
mona
lisa
James
Dean

/////
i want to retriev the usernames as one token only,
am i missing some codes somewhere
i will be very thankful if any one can help me on this
thanks in advance
Reply | Threaded
Open this post in threaded view
|

Re: Problem: Facets tokenize tags with spaces. Is there a solution?

jsbonline2006
In reply to this post by Royce
Hi All,

Here is the solution for all of you:
1) You have to define your facet as multi_field value as follows

  "mappings": {
    "data": {
      "properties": {
        "name": {
          "type": "multi_field",
          "fields": {
            "name": {
              "type": "string",
              "index": "analyzed"
            },
            "untouched": {
              "type": "string",
              "index": "not_analyzed"
            }
          }
        },

Here my "name" field is multi_field value. I can use "name" for searching purpose and "name.untouched" for faceting purpose.

I was facing same issue earlier as you guys mentioned in above thread. and then above mapping and usage helped me in resolving this issue

Regards,
Jayesh Bhoyar
http://www.linkedin.com/in/jayeshbhoyar

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/20772e3f-2244-42e8-bf19-ac37c0efbaab%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.