Documentation for node level caching and a few caching related questions

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Documentation for node level caching and a few caching related questions

ppearcy
Hi,
  Came across this feature:
http://github.com/elasticsearch/elasticsearch/issues/issue/235

But was not able to find it reflected in the docs. I figured it would
be in the Node settings.

Also, does this cache hold data for filter queries or other items, as
well?

While on the topic of caching, I was curious, at what level of
granularity are filter queries cached?

For example, in the sample query here:
http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/and_filter/

Are there two cache objects created for each filter (separate date and
name caches) or one cache that holds the results of everything
specified in filters?

How does this work in the case of nested filters? For the example
below, I'd guess one cached item for each filters grouping, so two
cached items one for the main filters and the other for the sub-
filters.

{
                                'query' : {
                                        'constant_score' : {
                                                'filter' : {
                                                        'and' : {
                                                                'filters' : [
                                                                        { 'term' : {'data' : 600} },
                                                                        { 'term' : {'symbol' : 'msft'} },
                                                                        { 'or' : {
                                                                                'filters' : [
                                                                                        {'term' : { 'Language' : 'en' } },
                                                                                        {'term' : { 'Language' : 'fr' } }
                                                                                        ]}}
                                                                        ]
                                                        }
                                                }
                                        }
                                }
                        }

Thanks!
Paul
Reply | Threaded
Open this post in threaded view
|

Re: Documentation for node level caching and a few caching related questions

kimchy
Administrator
On Wed, Jul 28, 2010 at 8:00 AM, Paul <[hidden email]> wrote:
Hi,
 Came across this feature:
http://github.com/elasticsearch/elasticsearch/issues/issue/235

This only relates to a case where you store the index in memory. Actually, I have not documented that change yet... . It basically pre allocated memory for the index to use. Now, this pre-allocated memory is shared between shards.
 


But was not able to find it reflected in the docs. I figured it would
be in the Node settings.

Also, does this cache hold data for filter queries or other items, as
well?

There are different caches for different aspects in elasticsearch. The above cache only relates to storing the index in memory.
 

While on the topic of caching, I was curious, at what level of
granularity are filter queries cached?

Thats a bit complicated when it comes to Lucene. In general, you should really care about it. If you are familiar with Lucene, then filters are cached on an IndexReader level.
 

For example, in the sample query here:
http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/and_filter/

Are there two cache objects created for each filter (separate date and
name caches) or one cache that holds the results of everything
specified in filters?

It depends on the filter / query (that accepts a filter). In case of AndFilter, only the inner filters are cached. The result is not cached as it make little sense with the And filter implementation to cache them.
 

How does this work in the case of nested filters? For the example
below, I'd guess one cached item for each filters grouping, so two
cached items one for the main filters and the other for the sub-
filters.

{
                               'query' : {
                                       'constant_score' : {
                                               'filter' : {
                                                       'and' : {
                                                               'filters' : [
                                                                       { 'term' : {'data' : 600} },
                                                                       { 'term' : {'symbol' : 'msft'} },
                                                                       { 'or' : {
                                                                               'filters' : [
                                                                                       {'term' : { 'Language' : 'en' } },
                                                                                       {'term' : { 'Language' : 'fr' } }
                                                                                       ]}}
                                                                       ]
                                                       }
                                               }
                                       }
                               }
                       }

Thanks!
Paul

Reply | Threaded
Open this post in threaded view
|

Re: Documentation for node level caching and a few caching related questions

ppearcy
Hi Shay,
  Many thanks for the details.

If using FS based index storage, are there any caching settings
available?

Best Regards,
Paul

On Jul 27, 11:57 pm, Shay Banon <[hidden email]> wrote:

> On Wed, Jul 28, 2010 at 8:00 AM, Paul <[hidden email]> wrote:
> > Hi,
> >  Came across this feature:
> >http://github.com/elasticsearch/elasticsearch/issues/issue/235
>
> This only relates to a case where you store the index in memory. Actually, I
> have not documented that change yet... . It basically pre allocated memory
> for the index to use. Now, this pre-allocated memory is shared between
> shards.
>
>
>
> > But was not able to find it reflected in the docs. I figured it would
> > be in the Node settings.
>
> > Also, does this cache hold data for filter queries or other items, as
> > well?
>
> There are different caches for different aspects in elasticsearch. The above
> cache only relates to storing the index in memory.
>
>
>
> > While on the topic of caching, I was curious, at what level of
> > granularity are filter queries cached?
>
> Thats a bit complicated when it comes to Lucene. In general, you should
> really care about it. If you are familiar with Lucene, then filters are
> cached on an IndexReader level.
>
>
>
> > For example, in the sample query here:
>
> >http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/an...
>
> > Are there two cache objects created for each filter (separate date and
> > name caches) or one cache that holds the results of everything
> > specified in filters?
>
> It depends on the filter / query (that accepts a filter). In case of
> AndFilter, only the inner filters are cached. The result is not cached as it
> make little sense with the And filter implementation to cache them.
>
>
>
> > How does this work in the case of nested filters? For the example
> > below, I'd guess one cached item for each filters grouping, so two
> > cached items one for the main filters and the other for the sub-
> > filters.
>
> > {
> >                                'query' : {
> >                                        'constant_score' : {
> >                                                'filter' : {
> >                                                        'and' : {
> >                                                                'filters' :
> > [
> >                                                                        {
> > 'term' : {'data' : 600} },
> >                                                                        {
> > 'term' : {'symbol' : 'msft'} },
> >                                                                        {
> > 'or' : {
>
> >    'filters' : [
>
> >            {'term' : { 'Language' : 'en' } },
>
> >            {'term' : { 'Language' : 'fr' } }
>
> >            ]}}
> >                                                                        ]
> >                                                        }
> >                                                }
> >                                        }
> >                                }
> >                        }
>
> > Thanks!
> > Paul
Reply | Threaded
Open this post in threaded view
|

Re: Documentation for node level caching and a few caching related questions

kimchy
Administrator
In elasticsearch, there are two more cached, the first is the filter cache, and the second is the what I call field data cache (field data is used when sorting, faceting, or using scripts). It uses JVM capabilities to cache, so its not like an LRU where you would configure the number of cache entries, eviction strategy and so on. The only option is to disable it, or choose between weak and soft cache.

-shay.banon

On Fri, Jul 30, 2010 at 3:39 AM, Paul <[hidden email]> wrote:
Hi Shay,
 Many thanks for the details.

If using FS based index storage, are there any caching settings
available?

Best Regards,
Paul

On Jul 27, 11:57 pm, Shay Banon <[hidden email]> wrote:
> On Wed, Jul 28, 2010 at 8:00 AM, Paul <[hidden email]> wrote:
> > Hi,
> >  Came across this feature:
> >http://github.com/elasticsearch/elasticsearch/issues/issue/235
>
> This only relates to a case where you store the index in memory. Actually, I
> have not documented that change yet... . It basically pre allocated memory
> for the index to use. Now, this pre-allocated memory is shared between
> shards.
>
>
>
> > But was not able to find it reflected in the docs. I figured it would
> > be in the Node settings.
>
> > Also, does this cache hold data for filter queries or other items, as
> > well?
>
> There are different caches for different aspects in elasticsearch. The above
> cache only relates to storing the index in memory.
>
>
>
> > While on the topic of caching, I was curious, at what level of
> > granularity are filter queries cached?
>
> Thats a bit complicated when it comes to Lucene. In general, you should
> really care about it. If you are familiar with Lucene, then filters are
> cached on an IndexReader level.
>
>
>
> > For example, in the sample query here:
>
> >http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/an...
>
> > Are there two cache objects created for each filter (separate date and
> > name caches) or one cache that holds the results of everything
> > specified in filters?
>
> It depends on the filter / query (that accepts a filter). In case of
> AndFilter, only the inner filters are cached. The result is not cached as it
> make little sense with the And filter implementation to cache them.
>
>
>
> > How does this work in the case of nested filters? For the example
> > below, I'd guess one cached item for each filters grouping, so two
> > cached items one for the main filters and the other for the sub-
> > filters.
>
> > {
> >                                'query' : {
> >                                        'constant_score' : {
> >                                                'filter' : {
> >                                                        'and' : {
> >                                                                'filters' :
> > [
> >                                                                        {
> > 'term' : {'data' : 600} },
> >                                                                        {
> > 'term' : {'symbol' : 'msft'} },
> >                                                                        {
> > 'or' : {
>
> >    'filters' : [
>
> >            {'term' : { 'Language' : 'en' } },
>
> >            {'term' : { 'Language' : 'fr' } }
>
> >            ]}}
> >                                                                        ]
> >                                                        }
> >                                                }
> >                                        }
> >                                }
> >                        }
>
> > Thanks!
> > Paul

Reply | Threaded
Open this post in threaded view
|

Re: Documentation for node level caching and a few caching related questions

Otis Gospodnetic
Hi Shay,

On Jul 30, 7:24 am, Shay Banon <[hidden email]> wrote:
> In elasticsearch, there are two more cached, the first is the filter cache,
> and the second is the what I call field data cache (field data is used when
> sorting, faceting, or using scripts). It uses JVM capabilities to cache, so
> its not like an LRU where you would configure the number of cache entries,
> eviction strategy and so on. The only option is to disable it, or choose
> between weak and soft cache.

I imagine the filter cache is actually ES-specific code.  Is the field
data cache also ES-specific, or are you referring to Lucene's
FieldCache?

Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

> -shay.banon
>
> On Fri, Jul 30, 2010 at 3:39 AM, Paul <[hidden email]> wrote:
> > Hi Shay,
> >  Many thanks for the details.
>
> > If using FS based index storage, are there any caching settings
> > available?
>
> > Best Regards,
> > Paul
>
> > On Jul 27, 11:57 pm, Shay Banon <[hidden email]> wrote:
> > > On Wed, Jul 28, 2010 at 8:00 AM, Paul <[hidden email]> wrote:
> > > > Hi,
> > > >  Came across this feature:
> > > >http://github.com/elasticsearch/elasticsearch/issues/issue/235
>
> > > This only relates to a case where you store the index in memory.
> > Actually, I
> > > have not documented that change yet... . It basically pre allocated
> > memory
> > > for the index to use. Now, this pre-allocated memory is shared between
> > > shards.
>
> > > > But was not able to find it reflected in the docs. I figured it would
> > > > be in the Node settings.
>
> > > > Also, does this cache hold data for filter queries or other items, as
> > > > well?
>
> > > There are different caches for different aspects in elasticsearch. The
> > above
> > > cache only relates to storing the index in memory.
>
> > > > While on the topic of caching, I was curious, at what level of
> > > > granularity are filter queries cached?
>
> > > Thats a bit complicated when it comes to Lucene. In general, you should
> > > really care about it. If you are familiar with Lucene, then filters are
> > > cached on an IndexReader level.
>
> > > > For example, in the sample query here:
>
> > > >http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/an.
> > ..
>
> > > > Are there two cache objects created for each filter (separate date and
> > > > name caches) or one cache that holds the results of everything
> > > > specified in filters?
>
> > > It depends on the filter / query (that accepts a filter). In case of
> > > AndFilter, only the inner filters are cached. The result is not cached as
> > it
> > > make little sense with the And filter implementation to cache them.
>
> > > > How does this work in the case of nested filters? For the example
> > > > below, I'd guess one cached item for each filters grouping, so two
> > > > cached items one for the main filters and the other for the sub-
> > > > filters.
>
> > > > {
> > > >                                'query' : {
> > > >                                        'constant_score' : {
> > > >                                                'filter' : {
> > > >                                                        'and' : {
>
> >  'filters' :
> > > > [
>
> >  {
> > > > 'term' : {'data' : 600} },
>
> >  {
> > > > 'term' : {'symbol' : 'msft'} },
>
> >  {
> > > > 'or' : {
>
> > > >    'filters' : [
>
> > > >            {'term' : { 'Language' : 'en' } },
>
> > > >            {'term' : { 'Language' : 'fr' } }
>
> > > >            ]}}
>
> >  ]
> > > >                                                        }
> > > >                                                }
> > > >                                        }
> > > >                                }
> > > >                        }
>
> > > > Thanks!
> > > > Paul
Reply | Threaded
Open this post in threaded view
|

Re: Documentation for node level caching and a few caching related questions

kimchy
Administrator
The field data cache is elasticsearch specific, it replaces Lucene FieldCache, but serves similar purpose (with extended functionality) with the ability to use it for other cases like facets and scripts. I tried to get some of the mentioned enhancements to Lucene but got pushed back (like using concurrent soft map).

-shay.banon

On Fri, Jul 30, 2010 at 6:11 PM, Otis <[hidden email]> wrote:
Hi Shay,

On Jul 30, 7:24 am, Shay Banon <[hidden email]> wrote:
> In elasticsearch, there are two more cached, the first is the filter cache,
> and the second is the what I call field data cache (field data is used when
> sorting, faceting, or using scripts). It uses JVM capabilities to cache, so
> its not like an LRU where you would configure the number of cache entries,
> eviction strategy and so on. The only option is to disable it, or choose
> between weak and soft cache.

I imagine the filter cache is actually ES-specific code.  Is the field
data cache also ES-specific, or are you referring to Lucene's
FieldCache?

Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

> -shay.banon
>
> On Fri, Jul 30, 2010 at 3:39 AM, Paul <[hidden email]> wrote:
> > Hi Shay,
> >  Many thanks for the details.
>
> > If using FS based index storage, are there any caching settings
> > available?
>
> > Best Regards,
> > Paul
>
> > On Jul 27, 11:57 pm, Shay Banon <[hidden email]> wrote:
> > > On Wed, Jul 28, 2010 at 8:00 AM, Paul <[hidden email]> wrote:
> > > > Hi,
> > > >  Came across this feature:
> > > >http://github.com/elasticsearch/elasticsearch/issues/issue/235
>
> > > This only relates to a case where you store the index in memory.
> > Actually, I
> > > have not documented that change yet... . It basically pre allocated
> > memory
> > > for the index to use. Now, this pre-allocated memory is shared between
> > > shards.
>
> > > > But was not able to find it reflected in the docs. I figured it would
> > > > be in the Node settings.
>
> > > > Also, does this cache hold data for filter queries or other items, as
> > > > well?
>
> > > There are different caches for different aspects in elasticsearch. The
> > above
> > > cache only relates to storing the index in memory.
>
> > > > While on the topic of caching, I was curious, at what level of
> > > > granularity are filter queries cached?
>
> > > Thats a bit complicated when it comes to Lucene. In general, you should
> > > really care about it. If you are familiar with Lucene, then filters are
> > > cached on an IndexReader level.
>
> > > > For example, in the sample query here:
>
> > > >http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/an.
> > ..
>
> > > > Are there two cache objects created for each filter (separate date and
> > > > name caches) or one cache that holds the results of everything
> > > > specified in filters?
>
> > > It depends on the filter / query (that accepts a filter). In case of
> > > AndFilter, only the inner filters are cached. The result is not cached as
> > it
> > > make little sense with the And filter implementation to cache them.
>
> > > > How does this work in the case of nested filters? For the example
> > > > below, I'd guess one cached item for each filters grouping, so two
> > > > cached items one for the main filters and the other for the sub-
> > > > filters.
>
> > > > {
> > > >                                'query' : {
> > > >                                        'constant_score' : {
> > > >                                                'filter' : {
> > > >                                                        'and' : {
>
> >  'filters' :
> > > > [
>
> >  {
> > > > 'term' : {'data' : 600} },
>
> >  {
> > > > 'term' : {'symbol' : 'msft'} },
>
> >  {
> > > > 'or' : {
>
> > > >    'filters' : [
>
> > > >            {'term' : { 'Language' : 'en' } },
>
> > > >            {'term' : { 'Language' : 'fr' } }
>
> > > >            ]}}
>
> >  ]
> > > >                                                        }
> > > >                                                }
> > > >                                        }
> > > >                                }
> > > >                        }
>
> > > > Thanks!
> > > > Paul