Terms facet all_terms does not work

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Terms facet all_terms does not work

Mustafa Sener
Hi,
I want to get all terms of a field. For this purpose I used all_terms property in terms facet. However, it does not work. In my test I had 1000 distinct terms for a field. However, when I executed terms facet with "all_terms":true parameter, it just returns first 10 terms. Is all_terms deprecated?

--
Mustafa Sener
www.ifountain.com
Reply | Threaded
Open this post in threaded view
|

Re: Terms facet all_terms does not work

kimchy
Administrator
all_terms is a bad name..., it basically means that you will get back terms with 0 count as well. There is no option to get back all terms back, open an issue?

On Wed, Dec 7, 2011 at 11:35 PM, Mustafa Sener <[hidden email]> wrote:
Hi,
I want to get all terms of a field. For this purpose I used all_terms property in terms facet. However, it does not work. In my test I had 1000 distinct terms for a field. However, when I executed terms facet with "all_terms":true parameter, it just returns first 10 terms. Is all_terms deprecated?

--
Mustafa Sener
www.ifountain.com

Reply | Threaded
Open this post in threaded view
|

Re: Terms facet all_terms does not work

project2501
Hi,
  I was going to ask about this as well. I use the termscomponent in
Solr and am migrating to ES.
The Solr termscomponent will return all the terms in the index (for a
field) with frequency counts on them.

From the above reply, it seems ES does not have a similar feature so I
vote to open an issue to expose
this Lucene capability in similar fashion.

On Dec 7, 4:42 pm, Shay Banon <[hidden email]> wrote:

> all_terms is a bad name..., it basically means that you will get back terms
> with 0 count as well. There is no option to get back all terms back, open
> an issue?
>
> On Wed, Dec 7, 2011 at 11:35 PM, Mustafa Sener <[hidden email]>wrote:
>
> > Hi,
> > I want to get all terms of a field. For this purpose I used all_terms
> > property in terms facet. However, it does not work. In my test I had 1000
> > distinct terms for a field. However, when I executed terms facet with
> > "all_terms":true parameter, it just returns first 10 terms. Is all_terms
> > deprecated?
>
> > --
> > Mustafa Sener
> >www.ifountain.com
Reply | Threaded
Open this post in threaded view
|

Re: Terms facet all_terms does not work

Loco Jay
+1 for issue (using 2 calls at moment)
+1 for field cache ability to got to disk and use less mem

On Dec 8, 2011, at 10:37 AM, project2501 wrote:

> Hi,
>  I was going to ask about this as well. I use the termscomponent in
> Solr and am migrating to ES.
> The Solr termscomponent will return all the terms in the index (for a
> field) with frequency counts on them.
>
> From the above reply, it seems ES does not have a similar feature so I
> vote to open an issue to expose
> this Lucene capability in similar fashion.
>
> On Dec 7, 4:42 pm, Shay Banon <[hidden email]> wrote:
>> all_terms is a bad name..., it basically means that you will get back terms
>> with 0 count as well. There is no option to get back all terms back, open
>> an issue?
>>
>> On Wed, Dec 7, 2011 at 11:35 PM, Mustafa Sener <[hidden email]>wrote:
>>
>>> Hi,
>>> I want to get all terms of a field. For this purpose I used all_terms
>>> property in terms facet. However, it does not work. In my test I had 1000
>>> distinct terms for a field. However, when I executed terms facet with
>>> "all_terms":true parameter, it just returns first 10 terms. Is all_terms
>>> deprecated?
>>
>>> --
>>> Mustafa Sener
>>> www.ifountain.com

Reply | Threaded
Open this post in threaded view
|

Re: Terms facet all_terms does not work

Ivan Brusic
Let me jump aboard and say that I am also looking for similar capabilities.

Currently evaluating the feasibility of converting a modified Lucene
project to ElasticSearch, and the functionality that I am not able to
replicate is the use of TermDocs. ElasticSearch has its own version of
FieldCache and I am currently looking at what precisely does it
contain and if I can be exposed. Having the ability to retrieve all
terms for a field will eliminate the need to access the FieldCache,
especially since I would need to execute some warmup queries in order
to populate it.

Ivan

On Thu, Dec 8, 2011 at 7:41 AM, Loco Jay <[hidden email]> wrote:

> +1 for issue (using 2 calls at moment)
> +1 for field cache ability to got to disk and use less mem
>
> On Dec 8, 2011, at 10:37 AM, project2501 wrote:
>
>> Hi,
>>  I was going to ask about this as well. I use the termscomponent in
>> Solr and am migrating to ES.
>> The Solr termscomponent will return all the terms in the index (for a
>> field) with frequency counts on them.
>>
>> From the above reply, it seems ES does not have a similar feature so I
>> vote to open an issue to expose
>> this Lucene capability in similar fashion.
>>
>> On Dec 7, 4:42 pm, Shay Banon <[hidden email]> wrote:
>>> all_terms is a bad name..., it basically means that you will get back terms
>>> with 0 count as well. There is no option to get back all terms back, open
>>> an issue?
>>>
>>> On Wed, Dec 7, 2011 at 11:35 PM, Mustafa Sener <[hidden email]>wrote:
>>>
>>>> Hi,
>>>> I want to get all terms of a field. For this purpose I used all_terms
>>>> property in terms facet. However, it does not work. In my test I had 1000
>>>> distinct terms for a field. However, when I executed terms facet with
>>>> "all_terms":true parameter, it just returns first 10 terms. Is all_terms
>>>> deprecated?
>>>
>>>> --
>>>> Mustafa Sener
>>>> www.ifountain.com
>
Reply | Threaded
Open this post in threaded view
|

Re: Terms facet all_terms does not work

Mustafa Sener
Hi,
I created an issue
https://github.com/elasticsearch/elasticsearch/issues/1530

On Thu, Dec 8, 2011 at 9:57 PM, Ivan Brusic <[hidden email]> wrote:
Let me jump aboard and say that I am also looking for similar capabilities.

Currently evaluating the feasibility of converting a modified Lucene
project to ElasticSearch, and the functionality that I am not able to
replicate is the use of TermDocs. ElasticSearch has its own version of
FieldCache and I am currently looking at what precisely does it
contain and if I can be exposed. Having the ability to retrieve all
terms for a field will eliminate the need to access the FieldCache,
especially since I would need to execute some warmup queries in order
to populate it.

Ivan

On Thu, Dec 8, 2011 at 7:41 AM, Loco Jay <[hidden email]> wrote:
> +1 for issue (using 2 calls at moment)
> +1 for field cache ability to got to disk and use less mem
>
> On Dec 8, 2011, at 10:37 AM, project2501 wrote:
>
>> Hi,
>>  I was going to ask about this as well. I use the termscomponent in
>> Solr and am migrating to ES.
>> The Solr termscomponent will return all the terms in the index (for a
>> field) with frequency counts on them.
>>
>> From the above reply, it seems ES does not have a similar feature so I
>> vote to open an issue to expose
>> this Lucene capability in similar fashion.
>>
>> On Dec 7, 4:42 pm, Shay Banon <[hidden email]> wrote:
>>> all_terms is a bad name..., it basically means that you will get back terms
>>> with 0 count as well. There is no option to get back all terms back, open
>>> an issue?
>>>
>>> On Wed, Dec 7, 2011 at 11:35 PM, Mustafa Sener <[hidden email]>wrote:
>>>
>>>> Hi,
>>>> I want to get all terms of a field. For this purpose I used all_terms
>>>> property in terms facet. However, it does not work. In my test I had 1000
>>>> distinct terms for a field. However, when I executed terms facet with
>>>> "all_terms":true parameter, it just returns first 10 terms. Is all_terms
>>>> deprecated?
>>>
>>>> --
>>>> Mustafa Sener
>>>> www.ifountain.com
>



--
Mustafa Sener
www.ifountain.com
WebRep
Overall rating
 

Reply | Threaded
Open this post in threaded view
|

Re: Terms facet all_terms does not work

kimchy
Administrator
In reply to this post by Ivan Brusic
What exactly are you looking for when working with TermDocs?

On Thu, Dec 8, 2011 at 9:57 PM, Ivan Brusic <[hidden email]> wrote:
Let me jump aboard and say that I am also looking for similar capabilities.

Currently evaluating the feasibility of converting a modified Lucene
project to ElasticSearch, and the functionality that I am not able to
replicate is the use of TermDocs. ElasticSearch has its own version of
FieldCache and I am currently looking at what precisely does it
contain and if I can be exposed. Having the ability to retrieve all
terms for a field will eliminate the need to access the FieldCache,
especially since I would need to execute some warmup queries in order
to populate it.

Ivan

On Thu, Dec 8, 2011 at 7:41 AM, Loco Jay <[hidden email]> wrote:
> +1 for issue (using 2 calls at moment)
> +1 for field cache ability to got to disk and use less mem
>
> On Dec 8, 2011, at 10:37 AM, project2501 wrote:
>
>> Hi,
>>  I was going to ask about this as well. I use the termscomponent in
>> Solr and am migrating to ES.
>> The Solr termscomponent will return all the terms in the index (for a
>> field) with frequency counts on them.
>>
>> From the above reply, it seems ES does not have a similar feature so I
>> vote to open an issue to expose
>> this Lucene capability in similar fashion.
>>
>> On Dec 7, 4:42 pm, Shay Banon <[hidden email]> wrote:
>>> all_terms is a bad name..., it basically means that you will get back terms
>>> with 0 count as well. There is no option to get back all terms back, open
>>> an issue?
>>>
>>> On Wed, Dec 7, 2011 at 11:35 PM, Mustafa Sener <[hidden email]>wrote:
>>>
>>>> Hi,
>>>> I want to get all terms of a field. For this purpose I used all_terms
>>>> property in terms facet. However, it does not work. In my test I had 1000
>>>> distinct terms for a field. However, when I executed terms facet with
>>>> "all_terms":true parameter, it just returns first 10 terms. Is all_terms
>>>> deprecated?
>>>
>>>> --
>>>> Mustafa Sener
>>>> www.ifountain.com
>

Reply | Threaded
Open this post in threaded view
|

Re: Terms facet all_terms does not work

Ivan Brusic
The use of TermDocs is to simply get all values for a specific term.
The existing Lucene infrastructure uses this information at startup to
pre-calculate/cache various properties of the system and to provide
some type of faceting. Not looking to get TermDoc access in
ElasticSearch, but a method to get all terms would be nice.

On Fri, Dec 9, 2011 at 2:34 PM, Shay Banon <[hidden email]> wrote:

> What exactly are you looking for when working with TermDocs?
>
> On Thu, Dec 8, 2011 at 9:57 PM, Ivan Brusic <[hidden email]> wrote:
>>
>> Let me jump aboard and say that I am also looking for similar
>> capabilities.
>>
>> Currently evaluating the feasibility of converting a modified Lucene
>> project to ElasticSearch, and the functionality that I am not able to
>> replicate is the use of TermDocs. ElasticSearch has its own version of
>> FieldCache and I am currently looking at what precisely does it
>> contain and if I can be exposed. Having the ability to retrieve all
>> terms for a field will eliminate the need to access the FieldCache,
>> especially since I would need to execute some warmup queries in order
>> to populate it.
>>
>> Ivan
>>
>> On Thu, Dec 8, 2011 at 7:41 AM, Loco Jay <[hidden email]> wrote:
>> > +1 for issue (using 2 calls at moment)
>> > +1 for field cache ability to got to disk and use less mem
>> >
>> > On Dec 8, 2011, at 10:37 AM, project2501 wrote:
>> >
>> >> Hi,
>> >>  I was going to ask about this as well. I use the termscomponent in
>> >> Solr and am migrating to ES.
>> >> The Solr termscomponent will return all the terms in the index (for a
>> >> field) with frequency counts on them.
>> >>
>> >> From the above reply, it seems ES does not have a similar feature so I
>> >> vote to open an issue to expose
>> >> this Lucene capability in similar fashion.
>> >>
>> >> On Dec 7, 4:42 pm, Shay Banon <[hidden email]> wrote:
>> >>> all_terms is a bad name..., it basically means that you will get back
>> >>> terms
>> >>> with 0 count as well. There is no option to get back all terms back,
>> >>> open
>> >>> an issue?
>> >>>
>> >>> On Wed, Dec 7, 2011 at 11:35 PM, Mustafa Sener
>> >>> <[hidden email]>wrote:
>> >>>
>> >>>> Hi,
>> >>>> I want to get all terms of a field. For this purpose I used all_terms
>> >>>> property in terms facet. However, it does not work. In my test I had
>> >>>> 1000
>> >>>> distinct terms for a field. However, when I executed terms facet with
>> >>>> "all_terms":true parameter, it just returns first 10 terms. Is
>> >>>> all_terms
>> >>>> deprecated?
>> >>>
>> >>>> --
>> >>>> Mustafa Sener
>> >>>> www.ifountain.com
>> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Terms facet all_terms does not work

kimchy
Administrator
There used to be a terms API in elasticsearch that returned all terms for a field, but it was not properly implemented (i.e. paginating through it while still having a consistent view, similar to scrolling), so it was removed. We can try and implement it again properly...

On Mon, Dec 12, 2011 at 5:13 AM, Ivan Brusic <[hidden email]> wrote:
The use of TermDocs is to simply get all values for a specific term.
The existing Lucene infrastructure uses this information at startup to
pre-calculate/cache various properties of the system and to provide
some type of faceting. Not looking to get TermDoc access in
ElasticSearch, but a method to get all terms would be nice.

On Fri, Dec 9, 2011 at 2:34 PM, Shay Banon <[hidden email]> wrote:
> What exactly are you looking for when working with TermDocs?
>
> On Thu, Dec 8, 2011 at 9:57 PM, Ivan Brusic <[hidden email]> wrote:
>>
>> Let me jump aboard and say that I am also looking for similar
>> capabilities.
>>
>> Currently evaluating the feasibility of converting a modified Lucene
>> project to ElasticSearch, and the functionality that I am not able to
>> replicate is the use of TermDocs. ElasticSearch has its own version of
>> FieldCache and I am currently looking at what precisely does it
>> contain and if I can be exposed. Having the ability to retrieve all
>> terms for a field will eliminate the need to access the FieldCache,
>> especially since I would need to execute some warmup queries in order
>> to populate it.
>>
>> Ivan
>>
>> On Thu, Dec 8, 2011 at 7:41 AM, Loco Jay <[hidden email]> wrote:
>> > +1 for issue (using 2 calls at moment)
>> > +1 for field cache ability to got to disk and use less mem
>> >
>> > On Dec 8, 2011, at 10:37 AM, project2501 wrote:
>> >
>> >> Hi,
>> >>  I was going to ask about this as well. I use the termscomponent in
>> >> Solr and am migrating to ES.
>> >> The Solr termscomponent will return all the terms in the index (for a
>> >> field) with frequency counts on them.
>> >>
>> >> From the above reply, it seems ES does not have a similar feature so I
>> >> vote to open an issue to expose
>> >> this Lucene capability in similar fashion.
>> >>
>> >> On Dec 7, 4:42 pm, Shay Banon <[hidden email]> wrote:
>> >>> all_terms is a bad name..., it basically means that you will get back
>> >>> terms
>> >>> with 0 count as well. There is no option to get back all terms back,
>> >>> open
>> >>> an issue?
>> >>>
>> >>> On Wed, Dec 7, 2011 at 11:35 PM, Mustafa Sener
>> >>> <[hidden email]>wrote:
>> >>>
>> >>>> Hi,
>> >>>> I want to get all terms of a field. For this purpose I used all_terms
>> >>>> property in terms facet. However, it does not work. In my test I had
>> >>>> 1000
>> >>>> distinct terms for a field. However, when I executed terms facet with
>> >>>> "all_terms":true parameter, it just returns first 10 terms. Is
>> >>>> all_terms
>> >>>> deprecated?
>> >>>
>> >>>> --
>> >>>> Mustafa Sener
>> >>>> www.ifountain.com
>> >
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Terms facet all_terms does not work

project2501
That would be great! Thanks.

Also, I'm not sure if Lucene supports this, but it would be cool if
the terms feature
can work  for search results too, thereby limiting the term vectors to
only those contained in document results.
The current behavior I see (from using Solr's terms component) is that
you can only get
terms from the whole index.

On Dec 12, 7:58 am, Shay Banon <[hidden email]> wrote:

> There used to be a terms API in elasticsearch that returned all terms for a
> field, but it was not properly implemented (i.e. paginating through it
> while still having a consistent view, similar to scrolling), so it was
> removed. We can try and implement it again properly...
>
> On Mon, Dec 12, 2011 at 5:13 AM, Ivan Brusic <[hidden email]> wrote:
> > The use of TermDocs is to simply get all values for a specific term.
> > The existing Lucene infrastructure uses this information at startup to
> > pre-calculate/cache various properties of the system and to provide
> > some type of faceting. Not looking to get TermDoc access in
> > ElasticSearch, but a method to get all terms would be nice.
>
> > On Fri, Dec 9, 2011 at 2:34 PM, Shay Banon <[hidden email]> wrote:
> > > What exactly are you looking for when working with TermDocs?
>
> > > On Thu, Dec 8, 2011 at 9:57 PM, Ivan Brusic <[hidden email]> wrote:
>
> > >> Let me jump aboard and say that I am also looking for similar
> > >> capabilities.
>
> > >> Currently evaluating the feasibility of converting a modified Lucene
> > >> project to ElasticSearch, and the functionality that I am not able to
> > >> replicate is the use of TermDocs. ElasticSearch has its own version of
> > >> FieldCache and I am currently looking at what precisely does it
> > >> contain and if I can be exposed. Having the ability to retrieve all
> > >> terms for a field will eliminate the need to access the FieldCache,
> > >> especially since I would need to execute some warmup queries in order
> > >> to populate it.
>
> > >> Ivan
>
> > >> On Thu, Dec 8, 2011 at 7:41 AM, Loco Jay <[hidden email]> wrote:
> > >> > +1 for issue (using 2 calls at moment)
> > >> > +1 for field cache ability to got to disk and use less mem
>
> > >> > On Dec 8, 2011, at 10:37 AM, project2501 wrote:
>
> > >> >> Hi,
> > >> >>  I was going to ask about this as well. I use the termscomponent in
> > >> >> Solr and am migrating to ES.
> > >> >> The Solr termscomponent will return all the terms in the index (for a
> > >> >> field) with frequency counts on them.
>
> > >> >> From the above reply, it seems ES does not have a similar feature so
> > I
> > >> >> vote to open an issue to expose
> > >> >> this Lucene capability in similar fashion.
>
> > >> >> On Dec 7, 4:42 pm, Shay Banon <[hidden email]> wrote:
> > >> >>> all_terms is a bad name..., it basically means that you will get
> > back
> > >> >>> terms
> > >> >>> with 0 count as well. There is no option to get back all terms back,
> > >> >>> open
> > >> >>> an issue?
>
> > >> >>> On Wed, Dec 7, 2011 at 11:35 PM, Mustafa Sener
> > >> >>> <[hidden email]>wrote:
>
> > >> >>>> Hi,
> > >> >>>> I want to get all terms of a field. For this purpose I used
> > all_terms
> > >> >>>> property in terms facet. However, it does not work. In my test I
> > had
> > >> >>>> 1000
> > >> >>>> distinct terms for a field. However, when I executed terms facet
> > with
> > >> >>>> "all_terms":true parameter, it just returns first 10 terms. Is
> > >> >>>> all_terms
> > >> >>>> deprecated?
>
> > >> >>>> --
> > >> >>>> Mustafa Sener
> > >> >>>>www.ifountain.com