Quantcast

Limiting the Field Cache with Filters on Documents

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Limiting the Field Cache with Filters on Documents

Mike
I understand that when you facet or sort on a field, it loads all of the possible values of that field into the field cache.  This can get huge, and is usually the cause of a lot of heap OOM errors, especially on fields with high cardinality.  My question is, are the values of the field that are put into the field cache limited by the number of records returned by the query, or does ElasticSearch load all unique values of a field into the field cache regardless of the filters?  If so, which filters limit them?  Are there only certain kinds of queries that limit these as well?  I read another post that said only the constant_score query limits the records used in the cache.

Example:
{
    "query": {
        "filtered": {
            "query": { ...QUERY A... },
            "filter": { ... FILTER A...}
        }
    },
    "filter": { ...FILTER B... },
    "facets": {
        "hugeFacet": {
            "terms": {
                "field": "fieldWithGBofUniqueValues"
            },
            "facet_filter": { ...FILTER C... }
        }
    }
}


I know that QUERY A, FILTER A, FILTER B, and FILTER C will limit the records that my hugeFacet will use when returning results. However:
  1. Will all 4 of those also limit what is added into the field cache?  If not, which ones will and won't?  I believe I read somewhere that facet_filters (FILTER C) limit what is put into the field cache, but it never said anything about these other filters.
  2. Can any query be used in QUERY A to decrease the number of field values that are looked at and added into the field cache, or is "constant_score" the only one that will?  If that's the case then does that actually mean nothing in my entire "filtered" query will limit what's put into the field cache (QUERY A and FILTER A)?


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Limiting the Field Cache with Filters on Documents

Mike
Fount the post: https://groups.google.com/d/topic/elasticsearch/ArOHQIKiMKE/discussion

"So if you need to account for filters when you run facets, you need to either wrap them in the constant_score query or use facet_filter."

Maybe I misunderstood.  Any clarification on my 2 questions above would be greatly appreciated.


On Monday, February 11, 2013 3:24:46 PM UTC-5, Mike wrote:
I understand that when you facet or sort on a field, it loads all of the possible values of that field into the field cache.  This can get huge, and is usually the cause of a lot of heap OOM errors, especially on fields with high cardinality.  My question is, are the values of the field that are put into the field cache limited by the number of records returned by the query, or does ElasticSearch load all unique values of a field into the field cache regardless of the filters?  If so, which filters limit them?  Are there only certain kinds of queries that limit these as well?  I read another post that said only the constant_score query limits the records used in the cache.

Example:
{
    "query": {
        "filtered": {
            "query": { ...QUERY A... },
            "filter": { ... FILTER A...}
        }
    },
    "filter": { ...FILTER B... },
    "facets": {
        "hugeFacet": {
            "terms": {
                "field": "fieldWithGBofUniqueValues"
            },
            "facet_filter": { ...FILTER C... }
        }
    }
}


I know that QUERY A, FILTER A, FILTER B, and FILTER C will limit the records that my hugeFacet will use when returning results. However:
  1. Will all 4 of those also limit what is added into the field cache?  If not, which ones will and won't?  I believe I read somewhere that facet_filters (FILTER C) limit what is put into the field cache, but it never said anything about these other filters.
  2. Can any query be used in QUERY A to decrease the number of field values that are looked at and added into the field cache, or is "constant_score" the only one that will?  If that's the case then does that actually mean nothing in my entire "filtered" query will limit what's put into the field cache (QUERY A and FILTER A)?


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Limiting the Field Cache with Filters on Documents

Radu Gheorghe-2
Hello Mike,

For sorting, Elasticsearch sorts all the documents matching your query and filters, then gives you the top X items back. Any document that's filtered out by either your query or your filter won't bother caches.

For faceting, filters don't matter. Facets are done on query results. If you need to filter the results on which facets are done, you need to use facet_filter. In this case, caches are used by all the documents on which faceting is done (query results, minus what facet_filter takes out).

Best regards,
Radu
--
http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Wed, Feb 13, 2013 at 5:50 PM, Mike <[hidden email]> wrote:
Fount the post: https://groups.google.com/d/topic/elasticsearch/ArOHQIKiMKE/discussion

"So if you need to account for filters when you run facets, you need to either wrap them in the constant_score query or use facet_filter."

Maybe I misunderstood.  Any clarification on my 2 questions above would be greatly appreciated.



On Monday, February 11, 2013 3:24:46 PM UTC-5, Mike wrote:
I understand that when you facet or sort on a field, it loads all of the possible values of that field into the field cache.  This can get huge, and is usually the cause of a lot of heap OOM errors, especially on fields with high cardinality.  My question is, are the values of the field that are put into the field cache limited by the number of records returned by the query, or does ElasticSearch load all unique values of a field into the field cache regardless of the filters?  If so, which filters limit them?  Are there only certain kinds of queries that limit these as well?  I read another post that said only the constant_score query limits the records used in the cache.

Example:
{
    "query": {
        "filtered": {
            "query": { ...QUERY A... },
            "filter": { ... FILTER A...}
        }
    },
    "filter": { ...FILTER B... },
    "facets": {
        "hugeFacet": {
            "terms": {
                "field": "fieldWithGBofUniqueValues"
            },
            "facet_filter": { ...FILTER C... }
        }
    }
}


I know that QUERY A, FILTER A, FILTER B, and FILTER C will limit the records that my hugeFacet will use when returning results. However:
  1. Will all 4 of those also limit what is added into the field cache?  If not, which ones will and won't?  I believe I read somewhere that facet_filters (FILTER C) limit what is put into the field cache, but it never said anything about these other filters.
  2. Can any query be used in QUERY A to decrease the number of field values that are looked at and added into the field cache, or is "constant_score" the only one that will?  If that's the case then does that actually mean nothing in my entire "filtered" query will limit what's put into the field cache (QUERY A and FILTER A)?


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Loading...