field count facet?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

field count facet?

caphrim007
Hi folks,

I was reading the facet documentation and was wondering if there was a
way that I could get a field count of the documents in my index.

I see that in the examples one specifies a field to count, and then
gets results from that, but I was interested in seeing how many
documents have a field, vs has a particular field value.

So assuming the following data

doc1.field1 = "abc"
doc1.field2 = "123"
doc1.field3 = "fgh"
doc2.field1 = "abc"
doc2.field2 = "123"
doc3.field4 = "123"

it would return

field1: 2
field2: 2
field3: 1
field4: 1

Instead of

term: abc, count: 2
term: 123, count 3
term: fgh, count 1

I basically want to know what the fields in my documents are because I
have an arbitrary list of fields that can exist.

Is this possible?

Thanks,
Tim
Reply | Threaded
Open this post in threaded view
|

Re: field count facet?

Pavel Penchev
Hi,

If you make a facet the sum of the facet values count will give you the number of documents that have value for this field. Alternatively you you could do a range query - http://www.elasticsearch.org/guide/reference/query-dsl/range-query.html. If you don't specify the from/to boundaries then you would get all documents that have value for the given field.

Regards,
Pavel


On 16.08.2011 16:45, caphrim007 wrote:
Hi folks,

I was reading the facet documentation and was wondering if there was a
way that I could get a field count of the documents in my index.

I see that in the examples one specifies a field to count, and then
gets results from that, but I was interested in seeing how many
documents have a field, vs has a particular field value.

So assuming the following data

doc1.field1 = "abc"
doc1.field2 = "123"
doc1.field3 = "fgh"
doc2.field1 = "abc"
doc2.field2 = "123"
doc3.field4 = "123"

it would return

field1: 2
field2: 2
field3: 1
field4: 1

Instead of

term: abc, count: 2
term: 123, count 3
term: fgh, count 1

I basically want to know what the fields in my documents are because I
have an arbitrary list of fields that can exist.

Is this possible?

Thanks,
Tim

Reply | Threaded
Open this post in threaded view
|

Re: field count facet?

caphrim007
Thanks for the info. A couple more questions/clarifications.

If I were to make a facet that was the sum of the facet values, I
would need to know one of those values to begin with wouldn't I? I'm
only interested having a count of the number of fields for a query;
not a count of the number of different values for a specified field.

A range query also looks like it requires a field name; again, looking
for an aggregate count of fields, not field values.

Any idea?

Thanks,
Tim

On Aug 16, 9:14 am, Pavel Penchev <[hidden email]> wrote:

> Hi,
>
> If you make a facet the sum of the facet values count will give you the
> number of documents that have value for this field. Alternatively you
> you could do a range query -http://www.elasticsearch.org/guide/reference/query-dsl/range-query.html.
> If you don't specify the from/to boundaries then you would get all
> documents that have value for the given field.
>
> Regards,
> Pavel
>
> On 16.08.2011 16:45, caphrim007 wrote:
>
> > Hi folks,
>
> > I was reading the facet documentation and was wondering if there was a
> > way that I could get a field count of the documents in my index.
>
> > I see that in the examples one specifies a field to count, and then
> > gets results from that, but I was interested in seeing how many
> > documents have a field, vs has a particular field value.
>
> > So assuming the following data
>
> > doc1.field1 = "abc"
> > doc1.field2 = "123"
> > doc1.field3 = "fgh"
> > doc2.field1 = "abc"
> > doc2.field2 = "123"
> > doc3.field4 = "123"
>
> > it would return
>
> > field1: 2
> > field2: 2
> > field3: 1
> > field4: 1
>
> > Instead of
>
> > term: abc, count: 2
> > term: 123, count 3
> > term: fgh, count 1
>
> > I basically want to know what the fields in my documents are because I
> > have an arbitrary list of fields that can exist.
>
> > Is this possible?
>
> > Thanks,
> > Tim
>
>
Reply | Threaded
Open this post in threaded view
|

Re: field count facet?

Pavel Penchev
Ok here's my understanding - you have documents with completely dynamic schema, fields can be added at any time and you don't know neither the name nor the type of the field. Then you want to make a query that would return which fields participate in the result and the number of documents that have value for each field.

I'm not aware of a built in mechanism in ES to do that. A clumsy and possibly slow way to do it:
1) Using the mappings API you obtain all present fields and their types ('curl http://localhost:9200/myindex/_mapping?pretty=true' check http://www.elasticsearch.org/guide/reference/mapping/)
2) You add to your query a facet request for each of the fields from 1)
3) In the query response you take each facet and you calculate how many documents have some value for this field (loop through all the values, sum)
4) Any field from 1) that doesn't have a facet result in 3) marks a field not present in the current result.

Hope this helps,
Pavel

On 16.08.2011 17:58, caphrim007 wrote:
Thanks for the info. A couple more questions/clarifications.

If I were to make a facet that was the sum of the facet values, I
would need to know one of those values to begin with wouldn't I? I'm
only interested having a count of the number of fields for a query;
not a count of the number of different values for a specified field.

A range query also looks like it requires a field name; again, looking
for an aggregate count of fields, not field values.

Any idea?

Thanks,
Tim

On Aug 16, 9:14 am, Pavel Penchev [hidden email] wrote:
Hi,

If you make a facet the sum of the facet values count will give you the
number of documents that have value for this field. Alternatively you
you could do a range query -http://www.elasticsearch.org/guide/reference/query-dsl/range-query.html.
If you don't specify the from/to boundaries then you would get all
documents that have value for the given field.

Regards,
Pavel

On 16.08.2011 16:45, caphrim007 wrote:

Hi folks,

        
I was reading the facet documentation and was wondering if there was a
way that I could get a field count of the documents in my index.

        
I see that in the examples one specifies a field to count, and then
gets results from that, but I was interested in seeing how many
documents have a field, vs has a particular field value.

        
So assuming the following data

        
doc1.field1 = "abc"
doc1.field2 = "123"
doc1.field3 = "fgh"
doc2.field1 = "abc"
doc2.field2 = "123"
doc3.field4 = "123"

        
it would return

        
field1: 2
field2: 2
field3: 1
field4: 1

        
Instead of

        
term: abc, count: 2
term: 123, count 3
term: fgh, count 1

        
I basically want to know what the fields in my documents are because I
have an arbitrary list of fields that can exist.

        
Is this possible?

        
Thanks,
Tim


Reply | Threaded
Open this post in threaded view
|

Re: field count facet?

caphrim007
I was completely unaware of the _mapping endpoint.

I can definitely make this work now.

Thanks for the suggested steps Pavel!

-Tim

On Aug 17, 6:46 am, Pavel Penchev <[hidden email]> wrote:

> Ok here's my understanding - you have documents with completely dynamic
> schema, fields can be added at any time and you don't know neither the
> name nor the type of the field. Then you want to make a query that would
> return which fields participate in the result and the number of
> documents that have value for each field.
>
> I'm not aware of a built in mechanism in ES to do that. A clumsy and
> possibly slow way to do it:
> 1) Using the mappings API you obtain all present fields and their types
> ('curlhttp://localhost:9200/myindex/_mapping?pretty=true'checkhttp://www.elasticsearch.org/guide/reference/mapping/)
> 2) You add to your query a facet request for each of the fields from 1)
> 3) In the query response you take each facet and you calculate how many
> documents have some value for this field (loop through all the values, sum)
> 4) Any field from 1) that doesn't have a facet result in 3) marks a
> field not present in the current result.
>
> Hope this helps,
> Pavel
>
> On 16.08.2011 17:58, caphrim007 wrote:
>
> > Thanks for the info. A couple more questions/clarifications.
>
> > If I were to make a facet that was the sum of the facet values, I
> > would need to know one of those values to begin with wouldn't I? I'm
> > only interested having a count of the number of fields for a query;
> > not a count of the number of different values for a specified field.
>
> > A range query also looks like it requires a field name; again, looking
> > for an aggregate count of fields, not field values.
>
> > Any idea?
>
> > Thanks,
> > Tim
>
> > On Aug 16, 9:14 am, Pavel Penchev<[hidden email]>  wrote:
> >> Hi,
>
> >> If you make a facet the sum of the facet values count will give you the
> >> number of documents that have value for this field. Alternatively you
> >> you could do a range query -http://www.elasticsearch.org/guide/reference/query-dsl/range-query.html.
> >> If you don't specify the from/to boundaries then you would get all
> >> documents that have value for the given field.
>
> >> Regards,
> >> Pavel
>
> >> On 16.08.2011 16:45, caphrim007 wrote:
>
> >>> Hi folks,
> >>> I was reading the facet documentation and was wondering if there was a
> >>> way that I could get a field count of the documents in my index.
> >>> I see that in the examples one specifies a field to count, and then
> >>> gets results from that, but I was interested in seeing how many
> >>> documents have a field, vs has a particular field value.
> >>> So assuming the following data
> >>> doc1.field1 = "abc"
> >>> doc1.field2 = "123"
> >>> doc1.field3 = "fgh"
> >>> doc2.field1 = "abc"
> >>> doc2.field2 = "123"
> >>> doc3.field4 = "123"
> >>> it would return
> >>> field1: 2
> >>> field2: 2
> >>> field3: 1
> >>> field4: 1
> >>> Instead of
> >>> term: abc, count: 2
> >>> term: 123, count 3
> >>> term: fgh, count 1
> >>> I basically want to know what the fields in my documents are because I
> >>> have an arbitrary list of fields that can exist.
> >>> Is this possible?
> >>> Thanks,
> >>> Tim
>
>