terms facet explodes memory

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

terms facet explodes memory

Jürgen kartnaller
The terms facet seems to read the terms field from ALL documents into the field cache not only the fields from the query result.

This also happens if the query returns no results for the facet.

In our case this results in :
   java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to restart all ES instances).

For my understanding the facet should only read fields contained in the result of the query.

Is there a way to avoid this problem?

Jürgen

Reply | Threaded
Open this post in threaded view
|

Re: terms facet explodes memory

kimchy
Administrator
Facets cause fields to be completely loaded to memory (its documented in each facet). The reason for that is performance, you don't want to go to disk for each hit you potentially have in order to fetch the value.

On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller <[hidden email]> wrote:
The terms facet seems to read the terms field from ALL documents into the field cache not only the fields from the query result.

This also happens if the query returns no results for the facet.

In our case this results in :
   java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to restart all ES instances).

For my understanding the facet should only read fields contained in the result of the query.

Is there a way to avoid this problem?

Jürgen


Reply | Threaded
Open this post in threaded view
|

Re: terms facet explodes memory

Jürgen kartnaller
This basically means I need more memory.


On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon <[hidden email]> wrote:
Facets cause fields to be completely loaded to memory (its documented in each facet). The reason for that is performance, you don't want to go to disk for each hit you potentially have in order to fetch the value.


On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller <[hidden email]> wrote:
The terms facet seems to read the terms field from ALL documents into the field cache not only the fields from the query result.

This also happens if the query returns no results for the facet.

In our case this results in :
   java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to restart all ES instances).

For my understanding the facet should only read fields contained in the result of the query.

Is there a way to avoid this problem?

Jürgen





--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

Reply | Threaded
Open this post in threaded view
|

Re: terms facet explodes memory

kimchy
Administrator
Yea :). Though, I do want to try and allow for other "cache" mechanism that would allow not to have all values in memory, but still have good perf when doing facets, but its down the road...

On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller <[hidden email]> wrote:
This basically means I need more memory.


On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon <[hidden email]> wrote:
Facets cause fields to be completely loaded to memory (its documented in each facet). The reason for that is performance, you don't want to go to disk for each hit you potentially have in order to fetch the value.


On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller <[hidden email]> wrote:
The terms facet seems to read the terms field from ALL documents into the field cache not only the fields from the query result.

This also happens if the query returns no results for the facet.

In our case this results in :
   java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to restart all ES instances).

For my understanding the facet should only read fields contained in the result of the query.

Is there a way to avoid this problem?

Jürgen





--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at


Reply | Threaded
Open this post in threaded view
|

Re: terms facet explodes memory

Jürgen kartnaller
Thanks, Shay

We are now using m2.xlarge with 30GB for ES. Will see tomorrow how it works.

We will have 5.5T documents, as a start and will have a lot of facet queries. We also implement our own specific facets to fulfill customer requirements.

On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon <[hidden email]> wrote:
Yea :). Though, I do want to try and allow for other "cache" mechanism that would allow not to have all values in memory, but still have good perf when doing facets, but its down the road...


On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller <[hidden email]> wrote:
This basically means I need more memory.


On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon <[hidden email]> wrote:
Facets cause fields to be completely loaded to memory (its documented in each facet). The reason for that is performance, you don't want to go to disk for each hit you potentially have in order to fetch the value.


On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller <[hidden email]> wrote:
The terms facet seems to read the terms field from ALL documents into the field cache not only the fields from the query result.

This also happens if the query returns no results for the facet.

In our case this results in :
   java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to restart all ES instances).

For my understanding the facet should only read fields contained in the result of the query.

Is there a way to avoid this problem?

Jürgen





--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at





--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

Reply | Threaded
Open this post in threaded view
|

Re: terms facet explodes memory

Stéphane Raux
Hi,

I have the same problem.

The point is that once all the fields are loaded in memory for a term
facet, the memory is never released, so if I do several term facets on
several fields, I end up with a OutOfMemoryError.
Would it be possible to provide a mechanism allowing to free the
memory taken by the fields ?
Or to check if the node has enought memory before loading the fields ?

Stéphane

2011/8/17 Jürgen kartnaller <[hidden email]>:

> Thanks, Shay
> We are now using m2.xlarge with 30GB for ES. Will see tomorrow how it works.
> We will have 5.5T documents, as a start and will have a lot of facet
> queries. We also implement our own specific facets to fulfill customer
> requirements.
> On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon <[hidden email]> wrote:
>>
>> Yea :). Though, I do want to try and allow for other "cache" mechanism
>> that would allow not to have all values in memory, but still have good perf
>> when doing facets, but its down the road...
>>
>> On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
>> <[hidden email]> wrote:
>>>
>>> This basically means I need more memory.
>>>
>>> On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon <[hidden email]> wrote:
>>>>
>>>> Facets cause fields to be completely loaded to memory (its documented in
>>>> each facet). The reason for that is performance, you don't want to go to
>>>> disk for each hit you potentially have in order to fetch the value.
>>>>
>>>> On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
>>>> <[hidden email]> wrote:
>>>>>
>>>>> The terms facet seems to read the terms field from ALL documents into
>>>>> the field cache not only the fields from the query result.
>>>>> This also happens if the query returns no results for the facet.
>>>>> In our case this results in :
>>>>>    java.lang.OutOfMemoryError: Java heap space
>>>>> which then leads into a no longer responding cluster (need to restart
>>>>> all ES instances).
>>>>> For my understanding the facet should only read fields contained in the
>>>>> result of the query.
>>>>> Is there a way to avoid this problem?
>>>>> Jürgen
>>>>
>>>
>>>
>>>
>>> --
>>> http://www.sfgdornbirn.at
>>> http://www.mcb-bregenz.at
>>>
>>
>
>
>
> --
> http://www.sfgdornbirn.at
> http://www.mcb-bregenz.at
>
>
Reply | Threaded
Open this post in threaded view
|

Re: terms facet explodes memory

Jürgen kartnaller
To solve this problem we now have our own facet implementations which is not using the field cache.

For us this is possible because we always have a small query result set as input for the facets.
The query filters about 100k documents out of 8G.
With the 100K docs the facet is still fast enough without a field cache.

We did this only for fields containing strings, still using the cache for date and numerical fields.

Jürgen

On Fri, Sep 30, 2011 at 10:19 AM, Stéphane Raux <[hidden email]> wrote:
Hi,

I have the same problem.

The point is that once all the fields are loaded in memory for a term
facet, the memory is never released, so if I do several term facets on
several fields, I end up with a OutOfMemoryError.
Would it be possible to provide a mechanism allowing to free the
memory taken by the fields ?
Or to check if the node has enought memory before loading the fields ?

Stéphane

2011/8/17 Jürgen kartnaller <[hidden email]>:
> Thanks, Shay
> We are now using m2.xlarge with 30GB for ES. Will see tomorrow how it works.
> We will have 5.5T documents, as a start and will have a lot of facet
> queries. We also implement our own specific facets to fulfill customer
> requirements.
> On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon <[hidden email]> wrote:
>>
>> Yea :). Though, I do want to try and allow for other "cache" mechanism
>> that would allow not to have all values in memory, but still have good perf
>> when doing facets, but its down the road...
>>
>> On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
>> <[hidden email]> wrote:
>>>
>>> This basically means I need more memory.
>>>
>>> On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon <[hidden email]> wrote:
>>>>
>>>> Facets cause fields to be completely loaded to memory (its documented in
>>>> each facet). The reason for that is performance, you don't want to go to
>>>> disk for each hit you potentially have in order to fetch the value.
>>>>
>>>> On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
>>>> <[hidden email]> wrote:
>>>>>
>>>>> The terms facet seems to read the terms field from ALL documents into
>>>>> the field cache not only the fields from the query result.
>>>>> This also happens if the query returns no results for the facet.
>>>>> In our case this results in :
>>>>>    java.lang.OutOfMemoryError: Java heap space
>>>>> which then leads into a no longer responding cluster (need to restart
>>>>> all ES instances).
>>>>> For my understanding the facet should only read fields contained in the
>>>>> result of the query.
>>>>> Is there a way to avoid this problem?
>>>>> Jürgen
>>>>
>>>
>>>
>>>
>>> --
>>> http://www.sfgdornbirn.at
>>> http://www.mcb-bregenz.at
>>>
>>
>
>
>
> --
> http://www.sfgdornbirn.at
> http://www.mcb-bregenz.at
>
>



--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

Reply | Threaded
Open this post in threaded view
|

Re: terms facet explodes memory

Stéphane Raux
It seems be be a good solution for my use case, I am also doing facets
with small subsets of my documents.

Did you implement it with the Java API ? Is it available somewhere ?

Stéphane

2011/9/30 Jürgen kartnaller <[hidden email]>:

> To solve this problem we now have our own facet implementations which is not
> using the field cache.
> For us this is possible because we always have a small query result set as
> input for the facets.
> The query filters about 100k documents out of 8G.
> With the 100K docs the facet is still fast enough without a field cache.
> We did this only for fields containing strings, still using the cache for
> date and numerical fields.
> Jürgen
> On Fri, Sep 30, 2011 at 10:19 AM, Stéphane Raux <[hidden email]>
> wrote:
>>
>> Hi,
>>
>> I have the same problem.
>>
>> The point is that once all the fields are loaded in memory for a term
>> facet, the memory is never released, so if I do several term facets on
>> several fields, I end up with a OutOfMemoryError.
>> Would it be possible to provide a mechanism allowing to free the
>> memory taken by the fields ?
>> Or to check if the node has enought memory before loading the fields ?
>>
>> Stéphane
>>
>> 2011/8/17 Jürgen kartnaller <[hidden email]>:
>> > Thanks, Shay
>> > We are now using m2.xlarge with 30GB for ES. Will see tomorrow how it
>> > works.
>> > We will have 5.5T documents, as a start and will have a lot of facet
>> > queries. We also implement our own specific facets to fulfill customer
>> > requirements.
>> > On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon <[hidden email]> wrote:
>> >>
>> >> Yea :). Though, I do want to try and allow for other "cache" mechanism
>> >> that would allow not to have all values in memory, but still have good
>> >> perf
>> >> when doing facets, but its down the road...
>> >>
>> >> On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
>> >> <[hidden email]> wrote:
>> >>>
>> >>> This basically means I need more memory.
>> >>>
>> >>> On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon <[hidden email]> wrote:
>> >>>>
>> >>>> Facets cause fields to be completely loaded to memory (its documented
>> >>>> in
>> >>>> each facet). The reason for that is performance, you don't want to go
>> >>>> to
>> >>>> disk for each hit you potentially have in order to fetch the value.
>> >>>>
>> >>>> On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
>> >>>> <[hidden email]> wrote:
>> >>>>>
>> >>>>> The terms facet seems to read the terms field from ALL documents
>> >>>>> into
>> >>>>> the field cache not only the fields from the query result.
>> >>>>> This also happens if the query returns no results for the facet.
>> >>>>> In our case this results in :
>> >>>>>    java.lang.OutOfMemoryError: Java heap space
>> >>>>> which then leads into a no longer responding cluster (need to
>> >>>>> restart
>> >>>>> all ES instances).
>> >>>>> For my understanding the facet should only read fields contained in
>> >>>>> the
>> >>>>> result of the query.
>> >>>>> Is there a way to avoid this problem?
>> >>>>> Jürgen
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> http://www.sfgdornbirn.at
>> >>> http://www.mcb-bregenz.at
>> >>>
>> >>
>> >
>> >
>> >
>> > --
>> > http://www.sfgdornbirn.at
>> > http://www.mcb-bregenz.at
>> >
>> >
>
>
>
> --
> http://www.sfgdornbirn.at
> http://www.mcb-bregenz.at
>
>
Reply | Threaded
Open this post in threaded view
|

Re: terms facet explodes memory

Jürgen kartnaller
It is implemented as a plugin but is not yet public available :(
I also made a simple distinct facet, alos for small data sets.

I will try to make it public if I find the time.

Jürgen

On Fri, Sep 30, 2011 at 11:14 AM, Stéphane Raux <[hidden email]> wrote:
It seems be be a good solution for my use case, I am also doing facets
with small subsets of my documents.

Did you implement it with the Java API ? Is it available somewhere ?

Stéphane

2011/9/30 Jürgen kartnaller <[hidden email]>:
> To solve this problem we now have our own facet implementations which is not
> using the field cache.
> For us this is possible because we always have a small query result set as
> input for the facets.
> The query filters about 100k documents out of 8G.
> With the 100K docs the facet is still fast enough without a field cache.
> We did this only for fields containing strings, still using the cache for
> date and numerical fields.
> Jürgen
> On Fri, Sep 30, 2011 at 10:19 AM, Stéphane Raux <[hidden email]>
> wrote:
>>
>> Hi,
>>
>> I have the same problem.
>>
>> The point is that once all the fields are loaded in memory for a term
>> facet, the memory is never released, so if I do several term facets on
>> several fields, I end up with a OutOfMemoryError.
>> Would it be possible to provide a mechanism allowing to free the
>> memory taken by the fields ?
>> Or to check if the node has enought memory before loading the fields ?
>>
>> Stéphane
>>
>> 2011/8/17 Jürgen kartnaller <[hidden email]>:
>> > Thanks, Shay
>> > We are now using m2.xlarge with 30GB for ES. Will see tomorrow how it
>> > works.
>> > We will have 5.5T documents, as a start and will have a lot of facet
>> > queries. We also implement our own specific facets to fulfill customer
>> > requirements.
>> > On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon <[hidden email]> wrote:
>> >>
>> >> Yea :). Though, I do want to try and allow for other "cache" mechanism
>> >> that would allow not to have all values in memory, but still have good
>> >> perf
>> >> when doing facets, but its down the road...
>> >>
>> >> On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
>> >> <[hidden email]> wrote:
>> >>>
>> >>> This basically means I need more memory.
>> >>>
>> >>> On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon <[hidden email]> wrote:
>> >>>>
>> >>>> Facets cause fields to be completely loaded to memory (its documented
>> >>>> in
>> >>>> each facet). The reason for that is performance, you don't want to go
>> >>>> to
>> >>>> disk for each hit you potentially have in order to fetch the value.
>> >>>>
>> >>>> On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
>> >>>> <[hidden email]> wrote:
>> >>>>>
>> >>>>> The terms facet seems to read the terms field from ALL documents
>> >>>>> into
>> >>>>> the field cache not only the fields from the query result.
>> >>>>> This also happens if the query returns no results for the facet.
>> >>>>> In our case this results in :
>> >>>>>    java.lang.OutOfMemoryError: Java heap space
>> >>>>> which then leads into a no longer responding cluster (need to
>> >>>>> restart
>> >>>>> all ES instances).
>> >>>>> For my understanding the facet should only read fields contained in
>> >>>>> the
>> >>>>> result of the query.
>> >>>>> Is there a way to avoid this problem?
>> >>>>> Jürgen
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> http://www.sfgdornbirn.at
>> >>> http://www.mcb-bregenz.at
>> >>>
>> >>
>> >
>> >
>> >
>> > --
>> > http://www.sfgdornbirn.at
>> > http://www.mcb-bregenz.at
>> >
>> >
>
>
>
> --
> http://www.sfgdornbirn.at
> http://www.mcb-bregenz.at
>
>



--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

Reply | Threaded
Open this post in threaded view
|

Re: terms facet explodes memory

Stéphane Raux
Thank you for the plugin, I hope you will find some time to make it public!

Anyway, would it be possible to provide a way to free the memory taken
by the values of the facets, maybe with an explicit call on a given
field or by providing an optional timeout?

An other solution may be to implement a slower implementation for
requesting facets on small subsets of documents?

Should I open a feature or an issue?

Stéphane

---------- Forwarded message ----------
From: Jürgen kartnaller <[hidden email]>
Date: 2011/9/30
Subject: Re: terms facet explodes memory
To: [hidden email]


It is implemented as a plugin but is not yet public available :(
I also made a simple distinct facet, alos for small data sets.
I will try to make it public if I find the time.
Jürgen

On Fri, Sep 30, 2011 at 11:14 AM, Stéphane Raux <[hidden email]> wrote:

>
> It seems be be a good solution for my use case, I am also doing facets
> with small subsets of my documents.
>
> Did you implement it with the Java API ? Is it available somewhere ?
>
> Stéphane
>
> 2011/9/30 Jürgen kartnaller <[hidden email]>:
> > To solve this problem we now have our own facet implementations which is not
> > using the field cache.
> > For us this is possible because we always have a small query result set as
> > input for the facets.
> > The query filters about 100k documents out of 8G.
> > With the 100K docs the facet is still fast enough without a field cache.
> > We did this only for fields containing strings, still using the cache for
> > date and numerical fields.
> > Jürgen
> > On Fri, Sep 30, 2011 at 10:19 AM, Stéphane Raux <[hidden email]>
> > wrote:
> >>
> >> Hi,
> >>
> >> I have the same problem.
> >>
> >> The point is that once all the fields are loaded in memory for a term
> >> facet, the memory is never released, so if I do several term facets on
> >> several fields, I end up with a OutOfMemoryError.
> >> Would it be possible to provide a mechanism allowing to free the
> >> memory taken by the fields ?
> >> Or to check if the node has enought memory before loading the fields ?
> >>
> >> Stéphane
> >>
> >> 2011/8/17 Jürgen kartnaller <[hidden email]>:
> >> > Thanks, Shay
> >> > We are now using m2.xlarge with 30GB for ES. Will see tomorrow how it
> >> > works.
> >> > We will have 5.5T documents, as a start and will have a lot of facet
> >> > queries. We also implement our own specific facets to fulfill customer
> >> > requirements.
> >> > On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon <[hidden email]> wrote:
> >> >>
> >> >> Yea :). Though, I do want to try and allow for other "cache" mechanism
> >> >> that would allow not to have all values in memory, but still have good
> >> >> perf
> >> >> when doing facets, but its down the road...
> >> >>
> >> >> On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
> >> >> <[hidden email]> wrote:
> >> >>>
> >> >>> This basically means I need more memory.
> >> >>>
> >> >>> On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon <[hidden email]> wrote:
> >> >>>>
> >> >>>> Facets cause fields to be completely loaded to memory (its documented
> >> >>>> in
> >> >>>> each facet). The reason for that is performance, you don't want to go
> >> >>>> to
> >> >>>> disk for each hit you potentially have in order to fetch the value.
> >> >>>>
> >> >>>> On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
> >> >>>> <[hidden email]> wrote:
> >> >>>>>
> >> >>>>> The terms facet seems to read the terms field from ALL documents
> >> >>>>> into
> >> >>>>> the field cache not only the fields from the query result.
> >> >>>>> This also happens if the query returns no results for the facet.
> >> >>>>> In our case this results in :
> >> >>>>>    java.lang.OutOfMemoryError: Java heap space
> >> >>>>> which then leads into a no longer responding cluster (need to
> >> >>>>> restart
> >> >>>>> all ES instances).
> >> >>>>> For my understanding the facet should only read fields contained in
> >> >>>>> the
> >> >>>>> result of the query.
> >> >>>>> Is there a way to avoid this problem?
> >> >>>>> Jürgen
> >> >>>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> http://www.sfgdornbirn.at
> >> >>> http://www.mcb-bregenz.at
> >> >>>
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > http://www.sfgdornbirn.at
> >> > http://www.mcb-bregenz.at
> >> >
> >> >
> >
> >
> >
> > --
> > http://www.sfgdornbirn.at
> > http://www.mcb-bregenz.at
> >
> >



--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at
Reply | Threaded
Open this post in threaded view
|

Re: terms facet explodes memory

kimchy
Administrator
There is a way to clear the field data cache (there is an API for that called clear cache), but not specifically for a specific field. Open an issue for that one, its a good idea to have it.

Regarding the slower impl, I am guessing that its implemented either by going to stored fields, or by extracting the stored source, parsing it, and fetching the value. Thats going to be expensive, but for a small result set, it might make sense. You can actually do that (for some facets) by using the script option, since you can do both _source.obj.field (loads source and parse it automatically) or _fields.field_name (fetches a stored field).

On Sun, Oct 2, 2011 at 8:38 PM, Stéphane Raux <[hidden email]> wrote:
Thank you for the plugin, I hope you will find some time to make it public!

Anyway, would it be possible to provide a way to free the memory taken
by the values of the facets, maybe with an explicit call on a given
field or by providing an optional timeout?

An other solution may be to implement a slower implementation for
requesting facets on small subsets of documents?

Should I open a feature or an issue?

Stéphane

---------- Forwarded message ----------
From: Jürgen kartnaller <[hidden email]>
Date: 2011/9/30
Subject: Re: terms facet explodes memory
To: [hidden email]


It is implemented as a plugin but is not yet public available :(
I also made a simple distinct facet, alos for small data sets.
I will try to make it public if I find the time.
Jürgen

On Fri, Sep 30, 2011 at 11:14 AM, Stéphane Raux <[hidden email]> wrote:
>
> It seems be be a good solution for my use case, I am also doing facets
> with small subsets of my documents.
>
> Did you implement it with the Java API ? Is it available somewhere ?
>
> Stéphane
>
> 2011/9/30 Jürgen kartnaller <[hidden email]>:
> > To solve this problem we now have our own facet implementations which is not
> > using the field cache.
> > For us this is possible because we always have a small query result set as
> > input for the facets.
> > The query filters about 100k documents out of 8G.
> > With the 100K docs the facet is still fast enough without a field cache.
> > We did this only for fields containing strings, still using the cache for
> > date and numerical fields.
> > Jürgen
> > On Fri, Sep 30, 2011 at 10:19 AM, Stéphane Raux <[hidden email]>
> > wrote:
> >>
> >> Hi,
> >>
> >> I have the same problem.
> >>
> >> The point is that once all the fields are loaded in memory for a term
> >> facet, the memory is never released, so if I do several term facets on
> >> several fields, I end up with a OutOfMemoryError.
> >> Would it be possible to provide a mechanism allowing to free the
> >> memory taken by the fields ?
> >> Or to check if the node has enought memory before loading the fields ?
> >>
> >> Stéphane
> >>
> >> 2011/8/17 Jürgen kartnaller <[hidden email]>:
> >> > Thanks, Shay
> >> > We are now using m2.xlarge with 30GB for ES. Will see tomorrow how it
> >> > works.
> >> > We will have 5.5T documents, as a start and will have a lot of facet
> >> > queries. We also implement our own specific facets to fulfill customer
> >> > requirements.
> >> > On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon <[hidden email]> wrote:
> >> >>
> >> >> Yea :). Though, I do want to try and allow for other "cache" mechanism
> >> >> that would allow not to have all values in memory, but still have good
> >> >> perf
> >> >> when doing facets, but its down the road...
> >> >>
> >> >> On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
> >> >> <[hidden email]> wrote:
> >> >>>
> >> >>> This basically means I need more memory.
> >> >>>
> >> >>> On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon <[hidden email]> wrote:
> >> >>>>
> >> >>>> Facets cause fields to be completely loaded to memory (its documented
> >> >>>> in
> >> >>>> each facet). The reason for that is performance, you don't want to go
> >> >>>> to
> >> >>>> disk for each hit you potentially have in order to fetch the value.
> >> >>>>
> >> >>>> On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
> >> >>>> <[hidden email]> wrote:
> >> >>>>>
> >> >>>>> The terms facet seems to read the terms field from ALL documents
> >> >>>>> into
> >> >>>>> the field cache not only the fields from the query result.
> >> >>>>> This also happens if the query returns no results for the facet.
> >> >>>>> In our case this results in :
> >> >>>>>    java.lang.OutOfMemoryError: Java heap space
> >> >>>>> which then leads into a no longer responding cluster (need to
> >> >>>>> restart
> >> >>>>> all ES instances).
> >> >>>>> For my understanding the facet should only read fields contained in
> >> >>>>> the
> >> >>>>> result of the query.
> >> >>>>> Is there a way to avoid this problem?
> >> >>>>> Jürgen
> >> >>>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> http://www.sfgdornbirn.at
> >> >>> http://www.mcb-bregenz.at
> >> >>>
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > http://www.sfgdornbirn.at
> >> > http://www.mcb-bregenz.at
> >> >
> >> >
> >
> >
> >
> > --
> > http://www.sfgdornbirn.at
> > http://www.mcb-bregenz.at
> >
> >



--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

Reply | Threaded
Open this post in threaded view
|

Re: terms facet explodes memory

Jürgen kartnaller


On Mon, Oct 3, 2011 at 12:13 AM, Shay Banon <[hidden email]> wrote:
There is a way to clear the field data cache (there is an API for that called clear cache), but not specifically for a specific field. Open an issue for that one, its a good idea to have it.

Regarding the slower impl, I am guessing that its implemented either by going to stored fields, or by extracting the stored source, parsing it, and fetching the value. Thats going to be expensive, but for a small result set, it might make sense. You can actually do that (for some facets) by using the script option, since you can do both _source.obj.field (loads source and parse it automatically) or _fields.field_name (fetches a stored field).

Exactly, I'm doing it on stored fields. Also using it for more complex custom facets.
 


On Sun, Oct 2, 2011 at 8:38 PM, Stéphane Raux <[hidden email]> wrote:
Thank you for the plugin, I hope you will find some time to make it public!

Anyway, would it be possible to provide a way to free the memory taken
by the values of the facets, maybe with an explicit call on a given
field or by providing an optional timeout?

An other solution may be to implement a slower implementation for
requesting facets on small subsets of documents?

Should I open a feature or an issue?

Stéphane

---------- Forwarded message ----------
From: Jürgen kartnaller <[hidden email]>
Date: 2011/9/30
Subject: Re: terms facet explodes memory
To: [hidden email]


It is implemented as a plugin but is not yet public available :(
I also made a simple distinct facet, alos for small data sets.
I will try to make it public if I find the time.
Jürgen

On Fri, Sep 30, 2011 at 11:14 AM, Stéphane Raux <[hidden email]> wrote:
>
> It seems be be a good solution for my use case, I am also doing facets
> with small subsets of my documents.
>
> Did you implement it with the Java API ? Is it available somewhere ?
>
> Stéphane
>
> 2011/9/30 Jürgen kartnaller <[hidden email]>:
> > To solve this problem we now have our own facet implementations which is not
> > using the field cache.
> > For us this is possible because we always have a small query result set as
> > input for the facets.
> > The query filters about 100k documents out of 8G.
> > With the 100K docs the facet is still fast enough without a field cache.
> > We did this only for fields containing strings, still using the cache for
> > date and numerical fields.
> > Jürgen
> > On Fri, Sep 30, 2011 at 10:19 AM, Stéphane Raux <[hidden email]>
> > wrote:
> >>
> >> Hi,
> >>
> >> I have the same problem.
> >>
> >> The point is that once all the fields are loaded in memory for a term
> >> facet, the memory is never released, so if I do several term facets on
> >> several fields, I end up with a OutOfMemoryError.
> >> Would it be possible to provide a mechanism allowing to free the
> >> memory taken by the fields ?
> >> Or to check if the node has enought memory before loading the fields ?
> >>
> >> Stéphane
> >>
> >> 2011/8/17 Jürgen kartnaller <[hidden email]>:
> >> > Thanks, Shay
> >> > We are now using m2.xlarge with 30GB for ES. Will see tomorrow how it
> >> > works.
> >> > We will have 5.5T documents, as a start and will have a lot of facet
> >> > queries. We also implement our own specific facets to fulfill customer
> >> > requirements.
> >> > On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon <[hidden email]> wrote:
> >> >>
> >> >> Yea :). Though, I do want to try and allow for other "cache" mechanism
> >> >> that would allow not to have all values in memory, but still have good
> >> >> perf
> >> >> when doing facets, but its down the road...
> >> >>
> >> >> On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
> >> >> <[hidden email]> wrote:
> >> >>>
> >> >>> This basically means I need more memory.
> >> >>>
> >> >>> On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon <[hidden email]> wrote:
> >> >>>>
> >> >>>> Facets cause fields to be completely loaded to memory (its documented
> >> >>>> in
> >> >>>> each facet). The reason for that is performance, you don't want to go
> >> >>>> to
> >> >>>> disk for each hit you potentially have in order to fetch the value.
> >> >>>>
> >> >>>> On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
> >> >>>> <[hidden email]> wrote:
> >> >>>>>
> >> >>>>> The terms facet seems to read the terms field from ALL documents
> >> >>>>> into
> >> >>>>> the field cache not only the fields from the query result.
> >> >>>>> This also happens if the query returns no results for the facet.
> >> >>>>> In our case this results in :
> >> >>>>>    java.lang.OutOfMemoryError: Java heap space
> >> >>>>> which then leads into a no longer responding cluster (need to
> >> >>>>> restart
> >> >>>>> all ES instances).
> >> >>>>> For my understanding the facet should only read fields contained in
> >> >>>>> the
> >> >>>>> result of the query.
> >> >>>>> Is there a way to avoid this problem?
> >> >>>>> Jürgen
> >> >>>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> http://www.sfgdornbirn.at
> >> >>> http://www.mcb-bregenz.at
> >> >>>
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > http://www.sfgdornbirn.at
> >> > http://www.mcb-bregenz.at
> >> >
> >> >
> >
> >
> >
> > --
> > http://www.sfgdornbirn.at
> > http://www.mcb-bregenz.at
> >
> >



--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at




--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

Reply | Threaded
Open this post in threaded view
|

Re: terms facet explodes memory

Stéphane Raux
My bad, I didn't notice the clear cache API, and the field_data
option. I think it will be enought to solve my problem.

I have opened an issue:
https://github.com/elasticsearch/elasticsearch/issues/1374

Thanks,

Stéphane

2011/10/3 Jürgen kartnaller <[hidden email]>:

>
>
> On Mon, Oct 3, 2011 at 12:13 AM, Shay Banon <[hidden email]> wrote:
>>
>> There is a way to clear the field data cache (there is an API for that
>> called clear cache), but not specifically for a specific field. Open an
>> issue for that one, its a good idea to have it.
>> Regarding the slower impl, I am guessing that its implemented either by
>> going to stored fields, or by extracting the stored source, parsing it, and
>> fetching the value. Thats going to be expensive, but for a small result set,
>> it might make sense. You can actually do that (for some facets) by using the
>> script option, since you can do both _source.obj.field (loads source and
>> parse it automatically) or _fields.field_name (fetches a stored field).
>
> Exactly, I'm doing it on stored fields. Also using it for more complex
> custom facets.
>
>>
>> On Sun, Oct 2, 2011 at 8:38 PM, Stéphane Raux <[hidden email]>
>> wrote:
>>>
>>> Thank you for the plugin, I hope you will find some time to make it
>>> public!
>>>
>>> Anyway, would it be possible to provide a way to free the memory taken
>>> by the values of the facets, maybe with an explicit call on a given
>>> field or by providing an optional timeout?
>>>
>>> An other solution may be to implement a slower implementation for
>>> requesting facets on small subsets of documents?
>>>
>>> Should I open a feature or an issue?
>>>
>>> Stéphane
>>>
>>> ---------- Forwarded message ----------
>>> From: Jürgen kartnaller <[hidden email]>
>>> Date: 2011/9/30
>>> Subject: Re: terms facet explodes memory
>>> To: [hidden email]
>>>
>>>
>>> It is implemented as a plugin but is not yet public available :(
>>> I also made a simple distinct facet, alos for small data sets.
>>> I will try to make it public if I find the time.
>>> Jürgen
>>>
>>> On Fri, Sep 30, 2011 at 11:14 AM, Stéphane Raux <[hidden email]>
>>> wrote:
>>> >
>>> > It seems be be a good solution for my use case, I am also doing facets
>>> > with small subsets of my documents.
>>> >
>>> > Did you implement it with the Java API ? Is it available somewhere ?
>>> >
>>> > Stéphane
>>> >
>>> > 2011/9/30 Jürgen kartnaller <[hidden email]>:
>>> > > To solve this problem we now have our own facet implementations which
>>> > > is not
>>> > > using the field cache.
>>> > > For us this is possible because we always have a small query result
>>> > > set as
>>> > > input for the facets.
>>> > > The query filters about 100k documents out of 8G.
>>> > > With the 100K docs the facet is still fast enough without a field
>>> > > cache.
>>> > > We did this only for fields containing strings, still using the cache
>>> > > for
>>> > > date and numerical fields.
>>> > > Jürgen
>>> > > On Fri, Sep 30, 2011 at 10:19 AM, Stéphane Raux
>>> > > <[hidden email]>
>>> > > wrote:
>>> > >>
>>> > >> Hi,
>>> > >>
>>> > >> I have the same problem.
>>> > >>
>>> > >> The point is that once all the fields are loaded in memory for a
>>> > >> term
>>> > >> facet, the memory is never released, so if I do several term facets
>>> > >> on
>>> > >> several fields, I end up with a OutOfMemoryError.
>>> > >> Would it be possible to provide a mechanism allowing to free the
>>> > >> memory taken by the fields ?
>>> > >> Or to check if the node has enought memory before loading the fields
>>> > >> ?
>>> > >>
>>> > >> Stéphane
>>> > >>
>>> > >> 2011/8/17 Jürgen kartnaller <[hidden email]>:
>>> > >> > Thanks, Shay
>>> > >> > We are now using m2.xlarge with 30GB for ES. Will see tomorrow how
>>> > >> > it
>>> > >> > works.
>>> > >> > We will have 5.5T documents, as a start and will have a lot of
>>> > >> > facet
>>> > >> > queries. We also implement our own specific facets
>>> > >> > to fulfill customer
>>> > >> > requirements.
>>> > >> > On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon <[hidden email]>
>>> > >> > wrote:
>>> > >> >>
>>> > >> >> Yea :). Though, I do want to try and allow for other "cache"
>>> > >> >> mechanism
>>> > >> >> that would allow not to have all values in memory, but still have
>>> > >> >> good
>>> > >> >> perf
>>> > >> >> when doing facets, but its down the road...
>>> > >> >>
>>> > >> >> On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
>>> > >> >> <[hidden email]> wrote:
>>> > >> >>>
>>> > >> >>> This basically means I need more memory.
>>> > >> >>>
>>> > >> >>> On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon <[hidden email]>
>>> > >> >>> wrote:
>>> > >> >>>>
>>> > >> >>>> Facets cause fields to be completely loaded to memory (its
>>> > >> >>>> documented
>>> > >> >>>> in
>>> > >> >>>> each facet). The reason for that is performance, you don't want
>>> > >> >>>> to go
>>> > >> >>>> to
>>> > >> >>>> disk for each hit you potentially have in order to fetch the
>>> > >> >>>> value.
>>> > >> >>>>
>>> > >> >>>> On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
>>> > >> >>>> <[hidden email]> wrote:
>>> > >> >>>>>
>>> > >> >>>>> The terms facet seems to read the terms field from ALL
>>> > >> >>>>> documents
>>> > >> >>>>> into
>>> > >> >>>>> the field cache not only the fields from the query result.
>>> > >> >>>>> This also happens if the query returns no results for the
>>> > >> >>>>> facet.
>>> > >> >>>>> In our case this results in :
>>> > >> >>>>>    java.lang.OutOfMemoryError: Java heap space
>>> > >> >>>>> which then leads into a no longer responding cluster (need to
>>> > >> >>>>> restart
>>> > >> >>>>> all ES instances).
>>> > >> >>>>> For my understanding the facet should only read fields
>>> > >> >>>>> contained in
>>> > >> >>>>> the
>>> > >> >>>>> result of the query.
>>> > >> >>>>> Is there a way to avoid this problem?
>>> > >> >>>>> Jürgen
>>> > >> >>>>
>>> > >> >>>
>>> > >> >>>
>>> > >> >>>
>>> > >> >>> --
>>> > >> >>> http://www.sfgdornbirn.at
>>> > >> >>> http://www.mcb-bregenz.at
>>> > >> >>>
>>> > >> >>
>>> > >> >
>>> > >> >
>>> > >> >
>>> > >> > --
>>> > >> > http://www.sfgdornbirn.at
>>> > >> > http://www.mcb-bregenz.at
>>> > >> >
>>> > >> >
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > http://www.sfgdornbirn.at
>>> > > http://www.mcb-bregenz.at
>>> > >
>>> > >
>>>
>>>
>>>
>>> --
>>> http://www.sfgdornbirn.at
>>> http://www.mcb-bregenz.at
>>
>
>
>
> --
> http://www.sfgdornbirn.at
> http://www.mcb-bregenz.at
>
>
Reply | Threaded
Open this post in threaded view
|

Re: terms facet explodes memory

andym
Hi,
I am running into the same facet / OOM problem. In our case we have
around 7M docs (10G index size with 5 shards, 2 replicas running on 2
m1.large instances) and 7 facets that we actively query against.
Unfortunately the number of elements in one of fields that we facets
on got very large (probably 10s of thousands) and we get OOM.

While with adding one more machine the shards rebalance very nicely (I
know with our config we can get up to 10) and we do not experience OOM
problem, I’d like to explore the possibility of reducing
dimensionality of the facets without re-indexing the whole thing at
the moment – is it possible though scripting (or other means) to
include into facet calculations only those facet terms that match
specific criteria (i.e. consist of one word, or start with a*, etc).
In other words, given query such as

{
  "query": {
    "query_string": {
      "query": "hello"
    }
  },
  "facets": {
    "myfacet1": {
      "terms": {
        "field": "myfacet1",
        "size": 50
      }
    }
  }
}


Is it possible to include scripting section in it that would instruct
the facet to load (and cache) only subset of all facets matching a
criteria (i.e. consisting of one word).

Thank you!

-- Andy

On Oct 3, 5:52 am, Stéphane Raux <[hidden email]> wrote:

> My bad, I didn't notice the clear cache API, and the field_data
> option. I think it will be enought to solve my problem.
>
> I have opened an issue:https://github.com/elasticsearch/elasticsearch/issues/1374
>
> Thanks,
>
> Stéphane
>
> 2011/10/3 Jürgen kartnaller <[hidden email]>:
>
>
>
>
>
> > On Mon, Oct 3, 2011 at 12:13 AM, Shay Banon <[hidden email]> wrote:
>
> >> There is a way to clear the field data cache (there is an API for that
> >> called clear cache), but not specifically for a specific field. Open an
> >> issue for that one, its a good idea to have it.
> >> Regarding the slower impl, I am guessing that its implemented either by
> >> going to stored fields, or by extracting the stored source, parsing it, and
> >> fetching the value. Thats going to be expensive, but for a small result set,
> >> it might make sense. You can actually do that (for some facets) by using the
> >> script option, since you can do both _source.obj.field (loads source and
> >> parse it automatically) or _fields.field_name (fetches a stored field).
>
> > Exactly, I'm doing it on stored fields. Also using it for more complex
> > custom facets.
>
> >> On Sun, Oct 2, 2011 at 8:38 PM, Stéphane Raux <[hidden email]>
> >> wrote:
>
> >>> Thank you for the plugin, I hope you will find some time to make it
> >>> public!
>
> >>> Anyway, would it be possible to provide a way to free the memory taken
> >>> by the values of the facets, maybe with an explicit call on a given
> >>> field or by providing an optional timeout?
>
> >>> An other solution may be to implement a slower implementation for
> >>> requesting facets on small subsets of documents?
>
> >>> Should I open a feature or an issue?
>
> >>> Stéphane
>
> >>> ---------- Forwarded message ----------
> >>> From: Jürgen kartnaller <[hidden email]>
> >>> Date: 2011/9/30
> >>> Subject: Re: terms facetexplodesmemory
> >>> To: [hidden email]
>
> >>> It is implemented as a plugin but is not yet public available :(
> >>> I also made a simple distinct facet, alos for small data sets.
> >>> I will try to make it public if I find the time.
> >>> Jürgen
>
> >>> On Fri, Sep 30, 2011 at 11:14 AM, Stéphane Raux <[hidden email]>
> >>> wrote:
>
> >>> > It seems be be a good solution for my use case, I am also doing facets
> >>> > with small subsets of my documents.
>
> >>> > Did you implement it with the Java API ? Is it available somewhere ?
>
> >>> > Stéphane
>
> >>> > 2011/9/30 Jürgen kartnaller <[hidden email]>:
> >>> > > To solve this problem we now have our own facet implementations which
> >>> > > is not
> >>> > > using the field cache.
> >>> > > For us this is possible because we always have a small query result
> >>> > > set as
> >>> > > input for the facets.
> >>> > > The query filters about 100k documents out of 8G.
> >>> > > With the 100K docs the facet is still fast enough without a field
> >>> > > cache.
> >>> > > We did this only for fields containing strings, still using the cache
> >>> > > for
> >>> > > date and numerical fields.
> >>> > > Jürgen
> >>> > > On Fri, Sep 30, 2011 at 10:19 AM, Stéphane Raux
> >>> > > <[hidden email]>
> >>> > > wrote:
>
> >>> > >> Hi,
>
> >>> > >> I have the same problem.
>
> >>> > >> The point is that once all the fields are loaded in memory for a
> >>> > >> term
> >>> > >> facet, the memory is never released, so if I do several term facets
> >>> > >> on
> >>> > >> several fields, I end up with a OutOfMemoryError.
> >>> > >> Would it be possible to provide a mechanism allowing to free the
> >>> > >> memory taken by the fields ?
> >>> > >> Or to check if the node has enought memory before loading the fields
> >>> > >> ?
>
> >>> > >> Stéphane
>
> >>> > >> 2011/8/17 Jürgen kartnaller <[hidden email]>:
> >>> > >> > Thanks, Shay
> >>> > >> > We are now using m2.xlarge with 30GB for ES. Will see tomorrow how
> >>> > >> > it
> >>> > >> > works.
> >>> > >> > We will have 5.5T documents, as a start and will have a lot of
> >>> > >> > facet
> >>> > >> > queries. We also implement our own specific facets
> >>> > >> > to fulfill customer
> >>> > >> > requirements.
> >>> > >> > On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon <[hidden email]>
> >>> > >> > wrote:
>
> >>> > >> >> Yea :). Though, I do want to try and allow for other "cache"
> >>> > >> >> mechanism
> >>> > >> >> that would allow not to have all values in memory, but still have
> >>> > >> >> good
> >>> > >> >> perf
> >>> > >> >> when doing facets, but its down the road...
>
> >>> > >> >> On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
> >>> > >> >> <[hidden email]> wrote:
>
> >>> > >> >>> This basically means I need more memory.
>
> >>> > >> >>> On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon <[hidden email]>
> >>> > >> >>> wrote:
>
> >>> > >> >>>> Facets cause fields to be completely loaded to memory (its
> >>> > >> >>>> documented
> >>> > >> >>>> in
> >>> > >> >>>> each facet). The reason for that is performance, you don't want
> >>> > >> >>>> to go
> >>> > >> >>>> to
> >>> > >> >>>> disk for each hit you potentially have in order to fetch the
> >>> > >> >>>> value.
>
> >>> > >> >>>> On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
> >>> > >> >>>> <[hidden email]> wrote:
>
> >>> > >> >>>>> The terms facet seems to read the terms field from ALL
> >>> > >> >>>>> documents
> >>> > >> >>>>> into
> >>> > >> >>>>> the field cache not only the fields from the query result.
> >>> > >> >>>>> This also happens if the query returns no results for the
> >>> > >> >>>>> facet.
> >>> > >> >>>>> In our case this results in :
> >>> > >> >>>>>    java.lang.OutOfMemoryError: Java heap space
> >>> > >> >>>>> which then leads into a no longer responding cluster (need to
> >>> > >> >>>>> restart
> >>> > >> >>>>> all ES instances).
> >>> > >> >>>>> For my understanding the facet should only read fields
> >>> > >> >>>>> contained in
> >>> > >> >>>>> the
> >>> > >> >>>>> result of the query.
> >>> > >> >>>>> Is there a way to avoid this problem?
> >>> > >> >>>>> Jürgen
>
> >>> > >> >>> --
> >>> > >> >>>http://www.sfgdornbirn.at
> >>> > >> >>>http://www.mcb-bregenz.at
>
> >>> > >> > --
> >>> > >> >http://www.sfgdornbirn.at
> >>> > >> >http://www.mcb-bregenz.at
>
> >>> > > --
> >>> > >http://www.sfgdornbirn.at
> >>> > >http://www.mcb-bregenz.at
>
> >>> --
> >>>http://www.sfgdornbirn.at
> >>>http://www.mcb-bregenz.at
>
> > --
> >http://www.sfgdornbirn.at
> >http://www.mcb-bregenz.at- Hide quoted text -
>
> - Show quoted text -
Reply | Threaded
Open this post in threaded view
|

Re: terms facet explodes memory

Jürgen kartnaller


On Fri, Nov 4, 2011 at 5:18 PM, andym <[hidden email]> wrote:
Hi,
I am running into the same facet / OOM problem. In our case we have
around 7M docs (10G index size with 5 shards, 2 replicas running on 2
m1.large instances) and 7 facets that we actively query against.
Unfortunately the number of elements in one of fields that we facets
on got very large (probably 10s of thousands) and we get OOM.

While with adding one more machine the shards rebalance very nicely (I
know with our config we can get up to 10) and we do not experience OOM
problem, I’d like to explore the possibility of reducing
dimensionality of the facets without re-indexing the whole thing at
the moment – is it possible though scripting (or other means) to
include into facet calculations only those facet terms that match
specific criteria (i.e. consist of one word, or start with a*, etc).
In other words, given query such as

{
 "query": {
   "query_string": {
     "query": "hello"
   }
 },
 "facets": {
   "myfacet1": {
     "terms": {
       "field": "myfacet1",
       "size": 50
     }
   }
 }
}


Is it possible to include scripting section in it that would instruct
the facet to load (and cache) only subset of all facets matching a
criteria (i.e. consisting of one word).

No, the facet is always pulling the full index of the field into memory.
 

Thank you!

-- Andy

On Oct 3, 5:52 am, Stéphane Raux <[hidden email]> wrote:
> My bad, I didn't notice the clear cache API, and the field_data
> option. I think it will be enought to solve my problem.
>
> I have opened an issue:https://github.com/elasticsearch/elasticsearch/issues/1374
>
> Thanks,
>
> Stéphane
>
> 2011/10/3 Jürgen kartnaller <[hidden email]>:
>
>
>
>
>
> > On Mon, Oct 3, 2011 at 12:13 AM, Shay Banon <[hidden email]> wrote:
>
> >> There is a way to clear the field data cache (there is an API for that
> >> called clear cache), but not specifically for a specific field. Open an
> >> issue for that one, its a good idea to have it.
> >> Regarding the slower impl, I am guessing that its implemented either by
> >> going to stored fields, or by extracting the stored source, parsing it, and
> >> fetching the value. Thats going to be expensive, but for a small result set,
> >> it might make sense. You can actually do that (for some facets) by using the
> >> script option, since you can do both _source.obj.field (loads source and
> >> parse it automatically) or _fields.field_name (fetches a stored field).
>
> > Exactly, I'm doing it on stored fields. Also using it for more complex
> > custom facets.
>
> >> On Sun, Oct 2, 2011 at 8:38 PM, Stéphane Raux <[hidden email]>
> >> wrote:
>
> >>> Thank you for the plugin, I hope you will find some time to make it
> >>> public!
>
> >>> Anyway, would it be possible to provide a way to free the memory taken
> >>> by the values of the facets, maybe with an explicit call on a given
> >>> field or by providing an optional timeout?
>
> >>> An other solution may be to implement a slower implementation for
> >>> requesting facets on small subsets of documents?
>
> >>> Should I open a feature or an issue?
>
> >>> Stéphane
>
> >>> ---------- Forwarded message ----------
> >>> From: Jürgen kartnaller <[hidden email]>
> >>> Date: 2011/9/30
> >>> Subject: Re: terms facetexplodesmemory
> >>> To: [hidden email]
>
> >>> It is implemented as a plugin but is not yet public available :(
> >>> I also made a simple distinct facet, alos for small data sets.
> >>> I will try to make it public if I find the time.
> >>> Jürgen
>
> >>> On Fri, Sep 30, 2011 at 11:14 AM, Stéphane Raux <[hidden email]>
> >>> wrote:
>
> >>> > It seems be be a good solution for my use case, I am also doing facets
> >>> > with small subsets of my documents.
>
> >>> > Did you implement it with the Java API ? Is it available somewhere ?
>
> >>> > Stéphane
>
> >>> > 2011/9/30 Jürgen kartnaller <[hidden email]>:
> >>> > > To solve this problem we now have our own facet implementations which
> >>> > > is not
> >>> > > using the field cache.
> >>> > > For us this is possible because we always have a small query result
> >>> > > set as
> >>> > > input for the facets.
> >>> > > The query filters about 100k documents out of 8G.
> >>> > > With the 100K docs the facet is still fast enough without a field
> >>> > > cache.
> >>> > > We did this only for fields containing strings, still using the cache
> >>> > > for
> >>> > > date and numerical fields.
> >>> > > Jürgen
> >>> > > On Fri, Sep 30, 2011 at 10:19 AM, Stéphane Raux
> >>> > > <[hidden email]>
> >>> > > wrote:
>
> >>> > >> Hi,
>
> >>> > >> I have the same problem.
>
> >>> > >> The point is that once all the fields are loaded in memory for a
> >>> > >> term
> >>> > >> facet, the memory is never released, so if I do several term facets
> >>> > >> on
> >>> > >> several fields, I end up with a OutOfMemoryError.
> >>> > >> Would it be possible to provide a mechanism allowing to free the
> >>> > >> memory taken by the fields ?
> >>> > >> Or to check if the node has enought memory before loading the fields
> >>> > >> ?
>
> >>> > >> Stéphane
>
> >>> > >> 2011/8/17 Jürgen kartnaller <[hidden email]>:
> >>> > >> > Thanks, Shay
> >>> > >> > We are now using m2.xlarge with 30GB for ES. Will see tomorrow how
> >>> > >> > it
> >>> > >> > works.
> >>> > >> > We will have 5.5T documents, as a start and will have a lot of
> >>> > >> > facet
> >>> > >> > queries. We also implement our own specific facets
> >>> > >> > to fulfill customer
> >>> > >> > requirements.
> >>> > >> > On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon <[hidden email]>
> >>> > >> > wrote:
>
> >>> > >> >> Yea :). Though, I do want to try and allow for other "cache"
> >>> > >> >> mechanism
> >>> > >> >> that would allow not to have all values in memory, but still have
> >>> > >> >> good
> >>> > >> >> perf
> >>> > >> >> when doing facets, but its down the road...
>
> >>> > >> >> On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
> >>> > >> >> <[hidden email]> wrote:
>
> >>> > >> >>> This basically means I need more memory.
>
> >>> > >> >>> On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon <[hidden email]>
> >>> > >> >>> wrote:
>
> >>> > >> >>>> Facets cause fields to be completely loaded to memory (its
> >>> > >> >>>> documented
> >>> > >> >>>> in
> >>> > >> >>>> each facet). The reason for that is performance, you don't want
> >>> > >> >>>> to go
> >>> > >> >>>> to
> >>> > >> >>>> disk for each hit you potentially have in order to fetch the
> >>> > >> >>>> value.
>
> >>> > >> >>>> On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
> >>> > >> >>>> <[hidden email]> wrote:
>
> >>> > >> >>>>> The terms facet seems to read the terms field from ALL
> >>> > >> >>>>> documents
> >>> > >> >>>>> into
> >>> > >> >>>>> the field cache not only the fields from the query result.
> >>> > >> >>>>> This also happens if the query returns no results for the
> >>> > >> >>>>> facet.
> >>> > >> >>>>> In our case this results in :
> >>> > >> >>>>>    java.lang.OutOfMemoryError: Java heap space
> >>> > >> >>>>> which then leads into a no longer responding cluster (need to
> >>> > >> >>>>> restart
> >>> > >> >>>>> all ES instances).
> >>> > >> >>>>> For my understanding the facet should only read fields
> >>> > >> >>>>> contained in
> >>> > >> >>>>> the
> >>> > >> >>>>> result of the query.
> >>> > >> >>>>> Is there a way to avoid this problem?
> >>> > >> >>>>> Jürgen
>
> >>> > >> >>> --
> >>> > >> >>>http://www.sfgdornbirn.at
> >>> > >> >>>http://www.mcb-bregenz.at
>
> >>> > >> > --
> >>> > >> >http://www.sfgdornbirn.at
> >>> > >> >http://www.mcb-bregenz.at
>
> >>> > > --
> >>> > >http://www.sfgdornbirn.at
> >>> > >http://www.mcb-bregenz.at
>
> >>> --
> >>>http://www.sfgdornbirn.at
> >>>http://www.mcb-bregenz.at
>
> > --
> >http://www.sfgdornbirn.at
> >http://www.mcb-bregenz.at- Hide quoted text -
>
> - Show quoted text -



--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

Reply | Threaded
Open this post in threaded view
|

Re: terms facet explodes memory

andym
Hi Jürgen,

Is there a chance you can post somewhere your facet implementation
that is not using the field cache (and if the code is not  “release
ready”, it’s perfectly OK, as otherwise I will probably end up doing
similar work to what you have already done) – in one of the scenarios
we’ll have a return set that is rather small, few thousand items, so
it’s should give us very reasonable performance with retrieving data
from stored fields directly.

Alternatively what is the way to aggregate values from stored fields
through scripting as Shay suggests? From the docs I see I can retrieve
particular stored field value (http://www.elasticsearch.org/guide/
reference/api/search/script-fields.html) but I could not find any
example of how I aggregate these values against the returned document
set

Thanks!

-- Andy


On Nov 4, 2:10 pm, Jürgen kartnaller <[hidden email]>
wrote:

> On Fri, Nov 4, 2011 at 5:18 PM, andym <[hidden email]> wrote:
> > Hi,
> > I am running into the same facet / OOM problem. In our case we have
> > around 7M docs (10G index size with 5 shards, 2 replicas running on 2
> > m1.large instances) and 7 facets that we actively query against.
> > Unfortunately the number of elements in one of fields that we facets
> > on got very large (probably 10s of thousands) and we get OOM.
>
> > While with adding one more machine the shards rebalance very nicely (I
> > know with our config we can get up to 10) and we do not experience OOM
> > problem, I’d like to explore the possibility of reducing
> > dimensionality of the facets without re-indexing the whole thing at
> > the moment – is it possible though scripting (or other means) to
> > include into facet calculations only those facet terms that match
> > specific criteria (i.e. consist of one word, or start with a*, etc).
> > In other words, given query such as
>
> > {
> >  "query": {
> >    "query_string": {
> >      "query": "hello"
> >    }
> >  },
> >  "facets": {
> >    "myfacet1": {
> >      "terms": {
> >        "field": "myfacet1",
> >        "size": 50
> >      }
> >    }
> >  }
> > }
>
> > Is it possible to include scripting section in it that would instruct
> > the facet to load (and cache) only subset of all facets matching a
> > criteria (i.e. consisting of one word).
>
> No, the facet is always pulling the full index of the field into memory.
>
>
>
>
>
>
>
> > Thank you!
>
> > -- Andy
>
> > On Oct 3, 5:52 am, Stéphane Raux <[hidden email]> wrote:
> > > My bad, I didn't notice the clear cache API, and the field_data
> > > option. I think it will be enought to solve my problem.
>
> > > I have opened an issue:
> >https://github.com/elasticsearch/elasticsearch/issues/1374
>
> > > Thanks,
>
> > > Stéphane
>
> > > 2011/10/3 Jürgen kartnaller <[hidden email]>:
>
> > > > On Mon, Oct 3, 2011 at 12:13 AM, Shay Banon <[hidden email]> wrote:
>
> > > >> There is a way to clear the field data cache (there is an API for that
> > > >> called clear cache), but not specifically for a specific field. Open
> > an
> > > >> issue for that one, its a good idea to have it.
> > > >> Regarding the slower impl, I am guessing that its implemented either
> > by
> > > >> going to stored fields, or by extracting the stored source, parsing
> > it, and
> > > >> fetching the value. Thats going to be expensive, but for a small
> > result set,
> > > >> it might make sense. You can actually do that (for some facets) by
> > using the
> > > >> script option, since you can do both _source.obj.field (loads source
> > and
> > > >> parse it automatically) or _fields.field_name (fetches a stored
> > field).
>
> > > > Exactly, I'm doing it on stored fields. Also using it for more complex
> > > > custom facets.
>
> > > >> On Sun, Oct 2, 2011 at 8:38 PM, Stéphane Raux <
> > [hidden email]>
> > > >> wrote:
>
> > > >>> Thank you for the plugin, I hope you will find some time to make it
> > > >>> public!
>
> > > >>> Anyway, would it be possible to provide a way to free the memory
> > taken
> > > >>> by the values of the facets, maybe with an explicit call on a given
> > > >>> field or by providing an optional timeout?
>
> > > >>> An other solution may be to implement a slower implementation for
> > > >>> requesting facets on small subsets of documents?
>
> > > >>> Should I open a feature or an issue?
>
> > > >>> Stéphane
>
> > > >>> ---------- Forwarded message ----------
> > > >>> From: Jürgen kartnaller <[hidden email]>
> > > >>> Date: 2011/9/30
> > > >>> Subject: Re: terms facetexplodesmemory
> > > >>> To: [hidden email]
>
> > > >>> It is implemented as a plugin but is not yet public available :(
> > > >>> I also made a simple distinct facet, alos for small data sets.
> > > >>> I will try to make it public if I find the time.
> > > >>> Jürgen
>
> > > >>> On Fri, Sep 30, 2011 at 11:14 AM, Stéphane Raux <
> > [hidden email]>
> > > >>> wrote:
>
> > > >>> > It seems be be a good solution for my use case, I am also doing
> > facets
> > > >>> > with small subsets of my documents.
>
> > > >>> > Did you implement it with the Java API ? Is it available somewhere
> > ?
>
> > > >>> > Stéphane
>
> > > >>> > 2011/9/30 Jürgen kartnaller <[hidden email]>:
> > > >>> > > To solve this problem we now have our own facet implementations
> > which
> > > >>> > > is not
> > > >>> > > using the field cache.
> > > >>> > > For us this is possible because we always have a small query
> > result
> > > >>> > > set as
> > > >>> > > input for the facets.
> > > >>> > > The query filters about 100k documents out of 8G.
> > > >>> > > With the 100K docs the facet is still fast enough without a field
> > > >>> > > cache.
> > > >>> > > We did this only for fields containing strings, still using the
> > cache
> > > >>> > > for
> > > >>> > > date and numerical fields.
> > > >>> > > Jürgen
> > > >>> > > On Fri, Sep 30, 2011 at 10:19 AM, Stéphane Raux
> > > >>> > > <[hidden email]>
> > > >>> > > wrote:
>
> > > >>> > >> Hi,
>
> > > >>> > >> I have the same problem.
>
> > > >>> > >> The point is that once all the fields are loaded in memory for a
> > > >>> > >> term
> > > >>> > >> facet, the memory is never released, so if I do several term
> > facets
> > > >>> > >> on
> > > >>> > >> several fields, I end up with a OutOfMemoryError.
> > > >>> > >> Would it be possible to provide a mechanism allowing to free the
> > > >>> > >> memory taken by the fields ?
> > > >>> > >> Or to check if the node has enought memory before loading the
> > fields
> > > >>> > >> ?
>
> > > >>> > >> Stéphane
>
> > > >>> > >> 2011/8/17 Jürgen kartnaller <[hidden email]>:
> > > >>> > >> > Thanks, Shay
> > > >>> > >> > We are now using m2.xlarge with 30GB for ES. Will see
> > tomorrow how
> > > >>> > >> > it
> > > >>> > >> > works.
> > > >>> > >> > We will have 5.5T documents, as a start and will have a lot of
> > > >>> > >> > facet
> > > >>> > >> > queries. We also implement our own specific facets
> > > >>> > >> > to fulfill customer
> > > >>> > >> > requirements.
> > > >>> > >> > On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon <[hidden email]
>
> > > >>> > >> > wrote:
>
> > > >>> > >> >> Yea :). Though, I do want to try and allow for other "cache"
> > > >>> > >> >> mechanism
> > > >>> > >> >> that would allow not to have all values in memory, but still
> > have
> > > >>> > >> >> good
> > > >>> > >> >> perf
> > > >>> > >> >> when doing facets, but its down the road...
>
> > > >>> > >> >> On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
> > > >>> > >> >> <[hidden email]> wrote:
>
> > > >>> > >> >>> This basically means I need more memory.
>
> > > >>> > >> >>> On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon <
> > [hidden email]>
> > > >>> > >> >>> wrote:
>
> > > >>> > >> >>>> Facets cause fields to be completely loaded to memory (its
> > > >>> > >> >>>> documented
> > > >>> > >> >>>> in
> > > >>> > >> >>>> each facet). The reason for that is performance, you don't
> > want
> > > >>> > >> >>>> to go
> > > >>> > >> >>>> to
> > > >>> > >> >>>> disk for each hit you potentially have in order to fetch
> > the
> > > >>> > >> >>>> value.
>
> > > >>> > >> >>>> On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
> > > >>> > >> >>>> <[hidden email]> wrote:
>
> > > >>> > >> >>>>> The terms facet seems to read the terms field from ALL
> > > >>> > >> >>>>> documents
> > > >>> > >> >>>>> into
> > > >>> > >> >>>>> the field cache not only the fields from the query result.
> > > >>> > >> >>>>> This also happens if the query returns no results for the
> > > >>> > >> >>>>> facet.
> > > >>> > >> >>>>> In our case this results in :
> > > >>> > >> >>>>>    java.lang.OutOfMemoryError: Java heap space
> > > >>> > >> >>>>> which then leads into a no longer responding cluster
> > (need to
> > > >>> > >> >>>>> restart
> > > >>> > >> >>>>> all ES instances).
> > > >>> > >> >>>>> For my understanding the facet should only read fields
> > > >>> > >> >>>>> contained in
> > > >>> > >> >>>>> the
> > > >>> > >> >>>>> result of the query.
> > > >>> > >> >>>>> Is there a way to avoid this problem?
> > > >>> > >> >>>>> Jürgen
>
> > > >>> > >> >>> --
> > > >>> > >> >>>http://www.sfgdornbirn.at
> > > >>> > >> >>>http://www.mcb-bregenz.at
>
> > > >>> > >> > --
> > > >>> > >> >http://www.sfgdornbirn.at
> > > >>> > >> >http://www.mcb-bregenz.at
>
> > > >>> > > --
> > > >>> > >http://www.sfgdornbirn.at
> > > >>> > >http://www.mcb-bregenz.at
>
> > > >>> --
> > > >>>http://www.sfgdornbirn.at
> > > >>>http://www.mcb-bregenz.at
>
> > > > --
> > > >http://www.sfgdornbirn.at
> > > >http://www.mcb-bregenz.at-Hide quoted text -
>
> > > - Show quoted text -
>
> --http://www.sfgdornbirn.athttp://www.mcb-bregenz.at- Hide quoted text -
>
> - Show quoted text -- Hide quoted text -
>
> - Show quoted text -