Out Of Memory during the terms facets

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Out Of Memory during the terms facets

Curt Hu
Actually, it's very strange for me to encounter this problem:

Index name: surfikisemterms

Index mapping:
{"surfikisemterms":{"jdbc_search":{"properties":{"facets":{"dynamic":"true","properties":{"termscount":{"dynamic":"true","properties":{"terms":{"dynamic":"true","properties":{"field":{"type":"string"},"size":{"type":"long"}}}}}}},"from":{"type":"long"},"query":{"dynamic":"true","properties":{"bool":{"dynamic":"true","properties":{"must":{"dynamic":"true","properties":{"query_string":{"dynamic":"true","properties":{"query":{"type":"string"}}}}}}}}},"size":{"type":"long"}}},"jdbc":{"properties":{"keywords":{"type":"string","analyzer":"comma"},"time":{"type":"date","format":"dateOptionalTime"}}}}}

Index setting:
{"surfikisemterms":{"settings":{"index.analysis.tokenizer.commatokenizer.type":"pattern","index.analysis.analyzer.comma.type":"custom","index.number_of_replicas":"1","index.version.created":"200599","index.analysis.tokenizer.commatokenizer.pattern":",","index.analysis.analyzer.comma.tokenizer":"commatokenizer","index.number_of_shards":"5"}}}

So actually we only have 2 fields for the index, keywords which is a word list and time is the timestamp.

I have 3 nodes in cluster: with all environment variables:
ES_HEAP_SIZE=3000M
ES_MIN_MEM=3000M
ES_MAX_MEM=3000M

Then, if a do a query like:
{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "time": {
              "from": "2013-03-07T11:29:19.000Z",
              "to": "2013-03-07T11:30:19.000Z"
            }
          }
        }
      ],
      "must_not": [],
      "should": []
    }
  },
  "from": 0,
  "size": 10,
  "sort": [],
  "facets": {}
}

Totally I shall get 21 hits, and it returns so fast. among these hits, there is no one hit could contain more than 40 keywords..

Then if I do:
{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "time": {
              "from": "2013-03-07T11:29:19.000Z",
              "to": "2013-03-07T11:30:19.000Z"
            }
          }
        }
      ],
      "must_not": [],
      "should": []
    }
  },
  "from": 0,
  "size": 22,
  "sort": [],
  "facets": {
    "termscount": {
      "terms": {
        "field": "keywords",
        "size": 25
      }
    }
  }
}
I can see following logs repeating for several time:

[2013-03-08 03:31:05,571][WARN ][index.cache.field.data.resident] [Surfiki Master: Bobby Hutcherson] [surfikisemterms] loading field [keywords] caused out of memory failure
java.lang.OutOfMemoryError: Java heap space..


What I got confused is for above query, the hit docs only contain no more than 500 keywords, why do such terms facet shall make out of memory error?

Thanks.

Reply | Threaded
Open this post in threaded view
|

Re: Out Of Memory during the terms facets

dadoonet
Hi,


Could you create a full curl recreation and GIST it?
On which ES version are you working?
Which java version?

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 8 mars 2013 à 04:33, Curt Hu <[hidden email]> a écrit :

Actually, it's very strange for me to encounter this problem:

Index name: surfikisemterms

Index mapping:
{"surfikisemterms":{"jdbc_search":{"properties":{"facets":{"dynamic":"true","properties":{"termscount":{"dynamic":"true","properties":{"terms":{"dynamic":"true","properties":{"field":{"type":"string"},"size":{"type":"long"}}}}}}},"from":{"type":"long"},"query":{"dynamic":"true","properties":{"bool":{"dynamic":"true","properties":{"must":{"dynamic":"true","properties":{"query_string":{"dynamic":"true","properties":{"query":{"type":"string"}}}}}}}}},"size":{"type":"long"}}},"jdbc":{"properties":{"keywords":{"type":"string","analyzer":"comma"},"time":{"type":"date","format":"dateOptionalTime"}}}}}

Index setting:
{"surfikisemterms":{"settings":{"index.analysis.tokenizer.commatokenizer.type":"pattern","index.analysis.analyzer.comma.type":"custom","index.number_of_replicas":"1","index.version.created":"200599","index.analysis.tokenizer.commatokenizer.pattern":",","index.analysis.analyzer.comma.tokenizer":"commatokenizer","index.number_of_shards":"5"}}}

So actually we only have 2 fields for the index, keywords which is a word
list and time is the timestamp.

I have 3 nodes in cluster: with all environment variables:
ES_HEAP_SIZE=3000M
ES_MIN_MEM=3000M
ES_MAX_MEM=3000M

Then, if a do a query like:
{
 "query": {
   "bool": {
     "must": [
       {
         "range": {
           "time": {
             "from": "2013-03-07T11:29:19.000Z",
             "to": "2013-03-07T11:30:19.000Z"
           }
         }
       }
     ],
     "must_not": [],
     "should": []
   }
 },
 "from": 0,
 "size": 10,
 "sort": [],
 "facets": {}
}

Totally I shall get 21 hits, and it returns so fast. among these hits, there
is no one hit could contain more than 40 keywords..

Then if I do:
{
 "query": {
   "bool": {
     "must": [
       {
         "range": {
           "time": {
             "from": "2013-03-07T11:29:19.000Z",
             "to": "2013-03-07T11:30:19.000Z"
           }
         }
       }
     ],
     "must_not": [],
     "should": []
   }
 },
 "from": 0,
 "size": 22,
 "sort": [],
 "facets": {
   "termscount": {
     "terms": {
       "field": "keywords",
       "size": 25
     }
   }
 }
}
I can see following logs repeating for several time:

[2013-03-08 03:31:05,571][WARN ][index.cache.field.data.resident] [Surfiki
Master: Bobby Hutcherson] [surfikisemterms] loading field [keywords] caused
out of memory failure
java.lang.OutOfMemoryError: Java heap space..


What I got confused is for above query, the hit docs only contain no more
than 500 keywords, why do such terms facet shall make out of memory error?

Thanks.





--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Out-Of-Memory-during-the-terms-facets-tp4031285.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


Reply | Threaded
Open this post in threaded view
|

Re: Out Of Memory during the terms facets

sujoysett
In reply to this post by Curt Hu
Hi,

The following thread might help you.
https://groups.google.com/forum/#!topic/elasticsearch/4Uxbmy-e1ao

-- Sujoy.

On Friday, March 8, 2013 9:03:16 AM UTC+5:30, Curt wrote:
Actually, it's very strange for me to encounter this problem:

Index name: surfikisemterms

Index mapping:
{"surfikisemterms":{"jdbc_search":{"properties":{"facets":{"dynamic":"true","properties":{"termscount":{"dynamic":"true","properties":{"terms":{"dynamic":"true","properties":{"field":{"type":"string"},"size":{"type":"long"}}}}}}},"from":{"type":"long"},"query":{"dynamic":"true","properties":{"bool":{"dynamic":"true","properties":{"must":{"dynamic":"true","properties":{"query_string":{"dynamic":"true","properties":{"query":{"type":"string"}}}}}}}}},"size":{"type":"long"}}},"jdbc":{"properties":{"keywords":{"type":"string","analyzer":"comma"},"time":{"type":"date","format":"dateOptionalTime"}}}}}

Index setting:
{"surfikisemterms":{"settings":{"index.analysis.tokenizer.commatokenizer.type":"pattern","index.analysis.analyzer.comma.type":"custom","index.number_of_replicas":"1","index.version.created":"200599","index.analysis.tokenizer.commatokenizer.pattern":",","index.analysis.analyzer.comma.tokenizer":"commatokenizer","index.number_of_shards":"5"}}}

So actually we only have 2 fields for the index, keywords which is a word
list and time is the timestamp.

I have 3 nodes in cluster: with all environment variables:
ES_HEAP_SIZE=3000M
ES_MIN_MEM=3000M
ES_MAX_MEM=3000M

Then, if a do a query like:
{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "time": {
              "from": "2013-03-07T11:29:19.000Z",
              "to": "2013-03-07T11:30:19.000Z"
            }
          }
        }
      ],
      "must_not": [],
      "should": []
    }
  },
  "from": 0,
  "size": 10,
  "sort": [],
  "facets": {}
}

Totally I shall get 21 hits, and it returns so fast. among these hits, there
is no one hit could contain more than 40 keywords..

Then if I do:
{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "time": {
              "from": "2013-03-07T11:29:19.000Z",
              "to": "2013-03-07T11:30:19.000Z"
            }
          }
        }
      ],
      "must_not": [],
      "should": []
    }
  },
  "from": 0,
  "size": 22,
  "sort": [],
  "facets": {
    "termscount": {
      "terms": {
        "field": "keywords",
        "size": 25
      }
    }
  }
}
I can see following logs repeating for several time:

[2013-03-08 03:31:05,571][WARN ][index.cache.field.data.resident] [Surfiki
Master: Bobby Hutcherson] [surfikisemterms] loading field [keywords] caused
out of memory failure
java.lang.OutOfMemoryError: Java heap space..


What I got confused is for above query, the hit docs only contain no more
than 500 keywords, why do such terms facet shall make out of memory error?

Thanks.





--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Out-Of-Memory-during-the-terms-facets-tp4031285.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Out Of Memory during the terms facets

Curt Hu
In reply to this post by dadoonet
Hi, update gist here:
https://gist.github.com/BlueStalker/5115578

Please tell me if I have missed anything. Great thanks for help.
Reply | Threaded
Open this post in threaded view
|

Re: Out Of Memory during the terms facets

Curt Hu
Hello , guys.

Is any update for this issue?
Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: Out Of Memory during the terms facets

dadoonet
In reply to this post by Curt Hu
Hi,


You need more memory to let Elasticsearch load all different values in cache.
You should try to run it with 0.90.0.Beta1 as the memory footprint has been reduced.

You can also redesign your index for example by dates. It will reduce the number of terms to load in memory.
It depends here on your use case. Sounds like your filter applies on a single day. So, an index per day?


-- 
David Pilato | Technical Advocate | Elasticsearch.com



Le 8 mars 2013 à 11:31, Curt Hu <[hidden email]> a écrit :

Hi, update gist here:
https://gist.github.com/BlueStalker/5115578

Please tell me if I have missed anything. Great thanks for help.



--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Out-Of-Memory-during-the-terms-facets-tp4031285p4031305.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.



--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Out Of Memory during the terms facets

Curt Hu
Hi,

Actually, how ElasticSearch implements the terms facets. You were talking about loading all different values in cache, what 's the mechanism..

In my real example, the query will filter the index within several minutes, which cause only about 20 docs in total hits, and the total terms are no more than 500.. It should be a limited number to do any type of sort..

Thanks.

dadoonet wrote
Hi,


You need more memory to let Elasticsearch load all different values in cache.
You should try to run it with 0.90.0.Beta1 as the memory footprint has been reduced.

You can also redesign your index for example by dates. It will reduce the number of terms to load in memory.
It depends here on your use case. Sounds like your filter applies on a single day. So, an index per day?


--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs



Le 8 mars 2013 à 11:31, Curt Hu <[hidden email]> a écrit :

> Hi, update gist here:
> https://gist.github.com/BlueStalker/5115578
>
> Please tell me if I have missed anything. Great thanks for help.
>
>
>
> --
> View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Out-Of-Memory-during-the-terms-facets-tp4031285p4031305.html
> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>
> --
> You received this message because you are subscribed to the Google Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Out Of Memory during the terms facets

Clinton Gormley-2
On Tue, 2013-03-12 at 19:54 -0700, Curt Hu wrote:
> Hi,
>
> Actually, how ElasticSearch implements the terms facets. You were talking
> about loading all different values in cache, what 's the mechanism..
>
> In my real example, the query will filter the index within several minutes,
> which cause only about 20 docs in total hits, and the total terms are no
> more than 500.. It should be a limited number to do any type of sort..

If you need the field values for docs 1,2 and 3 for this query, you're
probably going to need the values for doc 4,5 and 6 at some stage in the
future.

So the most efficient thing to do is to load the field values for all
docs into memory in one go. Then they're available for future requests
without having to reload.

So unfortunately, even if your query matches just a few docs, you need
space on your heap for the values from all docs.

clint

>
> Thanks.
>
>
> dadoonet wrote
> > Hi,
> >
> >
> > You need more memory to let Elasticsearch load all different values in
> > cache.
> > You should try to run it with 0.90.0.Beta1 as the memory footprint has
> > been reduced.
> >
> > You can also redesign your index for example by dates. It will reduce the
> > number of terms to load in memory.
> > It depends here on your use case. Sounds like your filter applies on a
> > single day. So, an index per day?
> >
> >
> > --
> > David Pilato | Technical Advocate | Elasticsearch.com
> > @dadoonet | @elasticsearchfr | @scrutmydocs
> >
> >
> >
> > Le 8 mars 2013 à 11:31, Curt Hu &lt;
>
> > zhongting.hu@
>
> > &gt; a écrit :
> >
> >> Hi, update gist here:
> >> https://gist.github.com/BlueStalker/5115578
> >>
> >> Please tell me if I have missed anything. Great thanks for help.
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >> http://elasticsearch-users.115913.n3.nabble.com/Out-Of-Memory-during-the-terms-facets-tp4031285p4031305.html
> >> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
> >>
> >> --
> >> You received this message because you are subscribed to the Google Groups
> >> "elasticsearch" group.
> >> To unsubscribe from this group and stop receiving emails from it, send an
> >> email to
>
> > elasticsearch+unsubscribe@
>
> > .
> >> For more options, visit https://groups.google.com/groups/opt_out.
> >>
> >>
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "elasticsearch" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to
>
> > elasticsearch+unsubscribe@
>
> > .
> > For more options, visit https://groups.google.com/groups/opt_out.
>
>
>
>
>
> --
> View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Out-Of-Memory-during-the-terms-facets-tp4031285p4031542.html
> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.