need count of terms using facets, taking space into account.

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

need count of terms using facets, taking space into account.

anjesh
Hi,

I have a following data in ES. 
{
"title": "title1 of project",
"organization": "XYZ company"
},
{
"title": "title2 of project",
"organization": "XYZ company"
},
{
"title": "title3 of project",
"organization": "ABC company"
},

I need the count of organizations as follows:
"ABC company":1
"XYZ company": 2

I tried using facets but facets give the count of words

{ "query" : {"match_all": {} },
"facets" : {"organization" : {"terms" : {"field": "organization"}}}}'

gives

"facets" : {
    "organization" : {
      "_type" : "terms",
      "missing" : 0,
      "total" : 6,
      "other" : 0,
      "terms" : [ {
        "term" : "company",
        "count" : 3
      }, {
        "term" : "xyz",
        "count" : 2
      }, {
        "term" : "abc",
        "count" : 1
      } ]
    }

I have no idea if there are any options which checks for the whole phrase than words in the facet terms. 
Tried searching here and there but couldn't find anything.

Thanks
Anjesh.
Reply | Threaded
Open this post in threaded view
|

Re: need count of terms using facets, taking space into account.

jagdeep
It basically depends on mapping. You have used default standard
analyzer, insted of that you need to use keyword analyzer.

Regards
Jagdeep

On May 2, 6:40 pm, anjesh <[hidden email]> wrote:

> Hi,
>
> I have a following data in ES.
> {
> "title": "title1 of project",
> "organization": "XYZ company"},
>
> {
> "title": "title2 of project",
> "organization": "XYZ company"},
>
> {
> "title": "title3 of project",
> "organization": "ABC company"
>
> },
>
> I need the count of organizations as follows:
> "ABC company":1
> "XYZ company": 2
>
> I tried using facets but facets give the count of words
>
> curl -X POSThttp://localhost:9200/testcompany/activity/_search?pretty=true-d'
> { "query" : {"match_all": {} },
>  "facets" : {"organization" : {"terms" : {"field": "organization"}}}}'
>
> gives
>
> "facets" : {
>     "organization" : {
>       "_type" : "terms",
>       "missing" : 0,
>       "total" : 6,
>       "other" : 0,
>       "terms" : [ {
>         "term" : "company",
>         "count" : 3
>       }, {
>         "term" : "xyz",
>         "count" : 2
>       }, {
>         "term" : "abc",
>         "count" : 1
>       } ]
>     }
>
> I have no idea if there are any options which checks for the whole phrase
> than words in the facet terms.
> Tried searching here and there but couldn't find anything.
>
> Thanks
> Anjesh.
Reply | Threaded
Open this post in threaded view
|

Re: need count of terms using facets, taking space into account.

Sumit Guptaa
hi jagdeep

i am also facing same problem can u give me one mapping example for this implementation it would be very helpful to me...

thanx
Sumit Gupta
Reply | Threaded
Open this post in threaded view
|

Re: need count of terms using facets, taking space into account.

Marcin Dojwa
Hi,

I think that setting index for 'organization' to 'not_analyzed' should work like you want.

Best regards.

2012/5/3 Sumit Guptaa <[hidden email]>
hi jagdeep

i am also facing same problem can u give me one mapping example for this
implementation it would be very helpful to me...

thanx
Sumit Gupta

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/need-count-of-terms-using-facets-taking-space-into-account-tp3956699p3958655.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: need count of terms using facets, taking space into account.

Sumit Guptaa
Hi Marcin

for getting the phrase count "not_analyzed" is not working..so if u hv any idea for searching the phrase using facet query. please help me..

Thanx
Sumit Gupta
Reply | Threaded
Open this post in threaded view
|

Re: need count of terms using facets, taking space into account.

sujoysett
First register custom analyzers, using your own configuration in the format like following, along with the index creation API

{
    "index": {
        "number_of_shards": 5,
        "number_of_replicas": 0,
        "analysis": {
            "analyzer": {
                "standard1": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "standard",
                        "lowercase"
                    ]
                },
                "keyword1": {
                    "type": "custom",
                    "tokenizer": "keyword"
                },
                "keyword2": {
                    "type": "pattern",
                    "pattern": ","
                }
            }
        }
    }
}

Next, use these analyzers to map to individual fields of the data you are going to post in your index. Use something like following in the update mapping API

{
    "mediasource": {
        "properties": {
            "mediaSourceTypeId": {
                "index": "analyzed",
                "type": "integer"
            },
            "isuName": {
                "analyzer": "keyword1",
                "type": "string"
            },
            "newsCategories": {
                "properties": {
                    "category": {
                        "analyzer": "keyword1",
                        "type": "string"
                    },
                    "category_words": {
                        "analyzer": "keyword1",
                        "type": "string"
                    },
                    "score": {
                        "index": "analyzed",
                        "type": "double"
                    }
                }
            }
        }
    }
}

For your data example, you have to analyze "organization" field with keyword analyzer. Just like I did for "isuName" field in my example.


On Thursday, May 3, 2012 3:45:58 PM UTC+5:30, Sumit Gupta wrote:
Hi Marcin

for getting the phrase count "not_analyzed" is not working..so if u hv any
idea for searching the phrase using facet query. please help me..

Thanx
Sumit Gupta

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/need-count-of-terms-using-facets-taking-space-into-account-tp3956699p3958739.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: need count of terms using facets, taking space into account.

Sumit Guptaa
Hi sujoysett
thanks for ur quick response.

after giving the mapping that u  define when we search for phrase there is no hit and searching for term the result like this...

curl -XPUT 'http://localhost:9200/my_twitter1/my_tweet/1' -d '{
    "user" : "hi hello how",
    "post_date" : "2011-09-20T16:20:00",
    "message" : "abc xyz def abc xyz def"
}'

and when we want to apply this facet query like


curl -X POST 'localhost:9200/my_twitter1/my_tweet/_search?pretty=true' -d '{
    "query": {
        "term": {
            "message": "abc"
        }
    },
    "facets": {
        "message": {
            "terms": {
                "field": "message"
            }
        }
    }
}'

and i m getting the result for all the count of abc like "abc":1,"xyz":1,"def":1 and when we search for         "abc xyz" ther is no hit..

so please help me  how i can search for "abc xyz" and also find the count "abc xyz" using facet query..



thanx
Sumit Gupta
Reply | Threaded
Open this post in threaded view
|

Re: need count of terms using facets, taking space into account.

jagdeep
Sumit change the of this field to keyword as explained by Sujoy. By
default its using standard analyzer.
 "message": {
                "analyzer": "keyword",
                "type": "string"
            },


Regards
Jagdeep

On May 3, 6:12 pm, Sumit Guptaa <[hidden email]> wrote:

> Hi sujoysett
> thanks for ur quick response.
>
> after giving the mapping that u  define when we search for phrase there is
> no hit and searching for term the result like this...
>
> curl -XPUT 'http://localhost:9200/my_twitter1/my_tweet/1'-d '{
>     "user" : "hi hello how",
>     "post_date" : "2011-09-20T16:20:00",
>     "message" : "abc xyz def abc xyz def"
>
> }'
>
> and when we want to apply this facet query like
>
> curl -X POST 'localhost:9200/my_twitter1/my_tweet/_search?pretty=true' -d '{
>     "query": {
>         "term": {
>             "message": "abc"
>         }
>     },
>     "facets": {
>         "message": {
>             "terms": {
>                 "field": "message"
>             }
>         }
>     }
>
> }'
>
> and i m getting the result for all the count of abc like
> "abc":1,"xyz":1,"def":1 and when we search for         "abc xyz" ther is no
> hit..
>
> so please help me  how i can search for "abc xyz" and also find the count
> "abc xyz" using facet query..
>
> thanx
> Sumit Gupta
>
> --
> View this message in context:http://elasticsearch-users.115913.n3.nabble.com/need-count-of-terms-u...
> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: need count of terms using facets, taking space into account.

Sumit Guptaa
Hi jagdeep

please can u give me one example for searching like "abc xyz" using facet query that give the count "abc xyz"for also...

because i am unable to get search "abc xyz" using facet query ...

Thanx,
Sumit Gupta
Reply | Threaded
Open this post in threaded view
|

Re: need count of terms using facets, taking space into account.

Ivan Brusic
Sumit,

I do not think you will be able to achieve what you want without
implementing a custom tokenizer. Analyzed tokenizers will tokenize on
whitespace, and keyword analyzers take the whole term without
stemming/splitting. You need a tokenizer that tokenizes a string into
different permutations of the terms. Something like this tokenizer
must already exist, but I do not think it is part of the default
Lucene/ElasticSearch packages.

Cheers,

Ivan

On Thu, May 3, 2012 at 9:31 AM, Sumit Guptaa <[hidden email]> wrote:

> Hi jagdeep
>
> please can u give me one example for searching like "abc xyz" using facet
> query that give the count "abc xyz"for also...
>
> because i am unable to get search "abc xyz" using facet query ...
>
> Thanx,
> Sumit Gupta
>
> --
> View this message in context: http://elasticsearch-users.115913.n3.nabble.com/need-count-of-terms-using-facets-taking-space-into-account-tp3956699p3959783.html
> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: need count of terms using facets, taking space into account.

Sumit Guptaa
hi Ivan

can u give me the full implementation for this so that i am able to perform the facet query on phrase....please help me...

Thanx,
Sumit Gupta
Reply | Threaded
Open this post in threaded view
|

Re: need count of terms using facets, taking space into account.

anjesh
hi

I managed to get what i am looking for using the followings. Thanks Jagdeep. I posted in entirety - that should work. 

    "activity" : {
        "properties" : {
            "organization" : {"analyzer": "keyword", "type": "string"}
        }
    }
}'
    "title": "title1 of project",
    "organization": "ABC company"
}'
    "title": "title2 of project",
    "organization": "XYZ company"
}'
    "title": "title3 of project",
    "organization": "XYZ company"
}'
    "query" : {"match_all":{}},
    "facets" : {"organization" : {"terms" : {"field": "organization"}}}
}'

gives

"facets" : {
    "organization" : {
      "_type" : "terms",
      "missing" : 0,
      "total" : 3,
      "other" : 0,
      "terms" : [ {
        "term" : "XYZ company",
        "count" : 2
      }, {
        "term" : "ABC company",
        "count" : 1
      } ]
    }

However now i can't search for ABC in organization field, as Sumit seems to be asking. 

"query" : {"term":{"organization": "ABC"}},
"facets" : {"organization" : {"terms" : {"field": "organization"}}}
}'

gives 0 hits.

But 

"query" : {"term":{"organization": "ABC company"}},
"facets" : {"organization" : {"terms" : {"field": "organization"}}}
}'

gives 

  "facets" : {
    "organization" : {
      "_type" : "terms",
      "missing" : 0,
      "total" : 1,
      "other" : 0,
      "terms" : [ {
        "term" : "ABC company",
        "count" : 1
      } ]
    }

I think something is still missing there and i can't seem to figure it out. The search is case sensitive in this case - "abc company" doesn't give results. I don't fully understand the internals - notably tokens, analyzers. I would appreciate if somebody could point to the appropriate posts. 

Best
Anjesh

On 5 May 2012 11:51, Sumit Guptaa <[hidden email]> wrote:
hi Ivan

can u give me the full implementation for this so that i am able to perform
the facet query on phrase....please help me...

Thanx,
Sumit Gupta

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/need-count-of-terms-using-facets-taking-space-into-account-tp3956699p3964087.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: need count of terms using facets, taking space into account.

jagdeep
You either have to use pattern analyzer with case_insensitive flag as
explained here
http://www.elasticsearch.org/guide/reference/index-modules/analysis/pattern-analyzer.html

Or you need to use regex with case_insensitive flag in your query
string

Regards
Jagdeep

On May 5, 10:05 pm, anjesh <[hidden email]> wrote:

> hi
>
> I managed to get what i am looking for using the followings. Thanks
> Jagdeep. I posted in entirety - that should work.
>
> curl -XDELETEhttp://localhost:9200/testcompany/
> curl -XPUThttp://localhost:9200/testcompany/
> curl -XPUT 'http://localhost:9200/testcompany/activity1/_mapping'-d '{
>     "activity" : {
>         "properties" : {
>             "organization" : {"analyzer": "keyword", "type": "string"}
>         }
>     }}'
>
> curl -XPUThttp://localhost:9200/testcompany/activity1/1-d '{
>     "title": "title1 of project",
>     "organization": "ABC company"}'
>
> curl -XPUThttp://localhost:9200/testcompany/activity1/2-d '{
>     "title": "title2 of project",
>     "organization": "XYZ company"}'
>
> curl -XPUThttp://localhost:9200/testcompany/activity1/3-d '{
>     "title": "title3 of project",
>     "organization": "XYZ company"}'
>
> curl -X POSThttp://localhost:9200/testcompany/activity1/_search?pretty=true-d
> '{
>     "query" : {"match_all":{}},
>     "facets" : {"organization" : {"terms" : {"field": "organization"}}}
>
> }'
>
> gives
>
> "facets" : {
>     "organization" : {
>       "_type" : "terms",
>       "missing" : 0,
>       "total" : 3,
>       "other" : 0,
>       "terms" : [ {
>         "term" : "XYZ company",
>         "count" : 2
>       }, {
>         "term" : "ABC company",
>         "count" : 1
>       } ]
>     }
>
> However now i can't search for ABC in organization field, as Sumit seems to
> be asking.
>
> curl -X POSThttp://localhost:9200/testcompany/activity1/_search?pretty=true-d
> '{
>  "query" : {"term":{"organization": "ABC"}},
> "facets" : {"organization" : {"terms" : {"field": "organization"}}}
>
> }'
>
> gives 0 hits.
>
> But
>
> curl -X POSThttp://localhost:9200/testcompany/activity1/_search?pretty=true-d
> '{
>  "query" : {"term":{"organization": "ABC company"}},
>  "facets" : {"organization" : {"terms" : {"field": "organization"}}}
>
> }'
>
> gives
>
>   "facets" : {
>     "organization" : {
>       "_type" : "terms",
>       "missing" : 0,
>       "total" : 1,
>       "other" : 0,
>       "terms" : [ {
>         "term" : "ABC company",
>         "count" : 1
>       } ]
>     }
>
> I think something is still missing there and i can't seem to figure it out.
> The search is case sensitive in this case - "abc company" doesn't give
> results. I don't fully understand the internals - notably tokens,
> analyzers. I would appreciate if somebody could point to the appropriate
> posts.
>
> Best
> Anjesh
>
> On 5 May 2012 11:51, Sumit Guptaa <[hidden email]> wrote:
>
>
>
>
>
>
>
> > hi Ivan
>
> > can u give me the full implementation for this so that i am able to perform
> > the facet query on phrase....please help me...
>
> > Thanx,
> > Sumit Gupta
>
> > --
> > View this message in context:
> >http://elasticsearch-users.115913.n3.nabble.com/need-count-of-terms-u...
> > Sent from the ElasticSearch Users mailing list archive at Nabble.com.