lack of memory?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

lack of memory?

Szymon Gwóźdź
Hi!

I've lauched 5 ElasticSeach nodes (each on different host) - 2 non-data nodes and 3 data nodes. Then I've started pushing to cluster few GB of data. While doing this I've seen on htop that ES processes are using more and more memory. After some time I see in logs stuff like that:

[13:34:45,762][WARN ][jgroups.FD               ] I was suspected by ggmail-test-5-49710; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
[13:34:46,042][WARN ][jgroups.pbcast.NAKACK    ] ggmail-test-3-55691: dropped message from ggmail-test-5-49710 (not in xmit_table), keys are [ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869, ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9] [ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869, ggmail-test-3-55691, ggmail-test-2-44830]
[13:34:54,373][WARN ][jgroups.pbcast.NAKACK    ] ggmail-test-3-55691: dropped message from ggmail-test-5-49710 (not in xmit_table), keys are [ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869, ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9] [ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869, ggmail-test-3-55691, ggmail-test-2-44830]

and that:

[WARN ][monitor.jvm              ] [Winky Man] Long GC collection occurred, took [10s], breached threshold [10s]

I'm not able to put new data to ES (on existing index and type). Curl returns:

{
  "error" : "PrimaryNotStartedActionException[[mail_index1001][3] Timeout waiting for [1m]]"
}

I configured in elasticsearch.in.sh Java heap space to 1GB. Bigger values (like 3GB) changes only time after which this situations happens.

I use configuration on data nodes:

cluster:
 name:    LocalCluster060
 
node:
    data:   true

http:
    enabled:   false

discovery:
    jgroups:
        config: tcp
        bind_port:    9700
        tcpping:
            initial_hosts: 10.1.112.46[9700], 10.1.112.58[9700], 10.1.112.47[9700], 10.1.112.59[9700], 10.1.112.214[9700]

What is the reason of problem?
Reply | Threaded
Open this post in threaded view
|

Re: lack of memory?

kimchy
Administrator
In general, this should not happen, as memory should be cleaned correctly. This might indicate a memory leak (btw, which version are you using)? It might just be that you are overloading the nodes and simply needs to allocate more memory per JVM (as you suggested, raising it to 3G).

How familiar are you with java? Is there a chance that I can ask for a memory dump of the JVM so I can analyze it?

cheers,
shay.banon

On Wed, Apr 28, 2010 at 2:46 PM, Szymon Gwóźdź <[hidden email]> wrote:
Hi!

I've lauched 5 ElasticSeach nodes (each on different host) - 2 non-data nodes and 3 data nodes. Then I've started pushing to cluster few GB of data. While doing this I've seen on htop that ES processes are using more and more memory. After some time I see in logs stuff like that:

[13:34:45,762][WARN ][jgroups.FD               ] I was suspected by ggmail-test-5-49710; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
[13:34:46,042][WARN ][jgroups.pbcast.NAKACK    ] ggmail-test-3-55691: dropped message from ggmail-test-5-49710 (not in xmit_table), keys are [ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869, ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9] [ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869, ggmail-test-3-55691, ggmail-test-2-44830]
[13:34:54,373][WARN ][jgroups.pbcast.NAKACK    ] ggmail-test-3-55691: dropped message from ggmail-test-5-49710 (not in xmit_table), keys are [ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869, ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9] [ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869, ggmail-test-3-55691, ggmail-test-2-44830]

and that:

[WARN ][monitor.jvm              ] [Winky Man] Long GC collection occurred, took [10s], breached threshold [10s]

I'm not able to put new data to ES (on existing index and type). Curl returns:

{
  "error" : "PrimaryNotStartedActionException[[mail_index1001][3] Timeout waiting for [1m]]"
}

I configured in elasticsearch.in.sh Java heap space to 1GB. Bigger values (like 3GB) changes only time after which this situations happens.

I use configuration on data nodes:

cluster:
 name:    LocalCluster060
 
node:
    data:   true

http:
    enabled:   false

discovery:
    jgroups:
        config: tcp
        bind_port:    9700
        tcpping:
            initial_hosts: 10.1.112.46[9700], 10.1.112.58[9700], 10.1.112.47[9700], 10.1.112.59[9700], 10.1.112.214[9700]

What is the reason of problem?

Reply | Threaded
Open this post in threaded view
|

Re: lack of memory?

Szymon Gwóźdź

2010/4/28 Shay Banon <[hidden email]>
In general, this should not happen, as memory should be cleaned correctly. This might indicate a memory leak (btw, which version are you using)?

I'm using version 0.6.0
 
It might just be that you are overloading the nodes and simply needs to allocate more memory per JVM (as you suggested, raising it to 3G).

Raising it to 3 GB will only delay the moment in which ES will "block" (I've tested it)
 

How familiar are you with java? Is there a chance that I can ask for a memory dump of the JVM so I can analyze it?

I'm not very familiar with Java, but maybe I will try to get JVM memory dump
 

cheers,
shay.banon

cheers,
Szymon

On Wed, Apr 28, 2010 at 2:46 PM, Szymon Gwóźdź <[hidden email]> wrote:
Hi!

I've lauched 5 ElasticSeach nodes (each on different host) - 2 non-data nodes and 3 data nodes. Then I've started pushing to cluster few GB of data. While doing this I've seen on htop that ES processes are using more and more memory. After some time I see in logs stuff like that:

[13:34:45,762][WARN ][jgroups.FD               ] I was suspected by ggmail-test-5-49710; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
[13:34:46,042][WARN ][jgroups.pbcast.NAKACK    ] ggmail-test-3-55691: dropped message from ggmail-test-5-49710 (not in xmit_table), keys are [ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869, ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9] [ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869, ggmail-test-3-55691, ggmail-test-2-44830]
[13:34:54,373][WARN ][jgroups.pbcast.NAKACK    ] ggmail-test-3-55691: dropped message from ggmail-test-5-49710 (not in xmit_table), keys are [ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869, ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9] [ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869, ggmail-test-3-55691, ggmail-test-2-44830]

and that:

[WARN ][monitor.jvm              ] [Winky Man] Long GC collection occurred, took [10s], breached threshold [10s]

I'm not able to put new data to ES (on existing index and type). Curl returns:

{
  "error" : "PrimaryNotStartedActionException[[mail_index1001][3] Timeout waiting for [1m]]"
}

I configured in elasticsearch.in.sh Java heap space to 1GB. Bigger values (like 3GB) changes only time after which this situations happens.

I use configuration on data nodes:

cluster:
 name:    LocalCluster060
 
node:
    data:   true

http:
    enabled:   false

discovery:
    jgroups:
        config: tcp
        bind_port:    9700
        tcpping:
            initial_hosts: 10.1.112.46[9700], 10.1.112.58[9700], 10.1.112.47[9700], 10.1.112.59[9700], 10.1.112.214[9700]

What is the reason of problem?


Reply | Threaded
Open this post in threaded view
|

Re: lack of memory?

kimchy
Administrator
You can use http://java.sun.com/javase/6/docs/technotes/tools/share/jmap.html. First, if you can send me the histo output, maybe that would be enough... (jmap -histo:live pid).

thanks!
shay.banon

2010/4/28 Szymon Gwóźdź <[hidden email]>

2010/4/28 Shay Banon <[hidden email]>

In general, this should not happen, as memory should be cleaned correctly. This might indicate a memory leak (btw, which version are you using)?

I'm using version 0.6.0
 
It might just be that you are overloading the nodes and simply needs to allocate more memory per JVM (as you suggested, raising it to 3G).

Raising it to 3 GB will only delay the moment in which ES will "block" (I've tested it)
 

How familiar are you with java? Is there a chance that I can ask for a memory dump of the JVM so I can analyze it?

I'm not very familiar with Java, but maybe I will try to get JVM memory dump
 

cheers,
shay.banon

cheers,
Szymon

On Wed, Apr 28, 2010 at 2:46 PM, Szymon Gwóźdź <[hidden email]> wrote:
Hi!

I've lauched 5 ElasticSeach nodes (each on different host) - 2 non-data nodes and 3 data nodes. Then I've started pushing to cluster few GB of data. While doing this I've seen on htop that ES processes are using more and more memory. After some time I see in logs stuff like that:

[13:34:45,762][WARN ][jgroups.FD               ] I was suspected by ggmail-test-5-49710; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
[13:34:46,042][WARN ][jgroups.pbcast.NAKACK    ] ggmail-test-3-55691: dropped message from ggmail-test-5-49710 (not in xmit_table), keys are [ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869, ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9] [ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869, ggmail-test-3-55691, ggmail-test-2-44830]
[13:34:54,373][WARN ][jgroups.pbcast.NAKACK    ] ggmail-test-3-55691: dropped message from ggmail-test-5-49710 (not in xmit_table), keys are [ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869, ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9] [ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869, ggmail-test-3-55691, ggmail-test-2-44830]

and that:

[WARN ][monitor.jvm              ] [Winky Man] Long GC collection occurred, took [10s], breached threshold [10s]

I'm not able to put new data to ES (on existing index and type). Curl returns:

{
  "error" : "PrimaryNotStartedActionException[[mail_index1001][3] Timeout waiting for [1m]]"
}

I configured in elasticsearch.in.sh Java heap space to 1GB. Bigger values (like 3GB) changes only time after which this situations happens.

I use configuration on data nodes:

cluster:
 name:    LocalCluster060
 
node:
    data:   true

http:
    enabled:   false

discovery:
    jgroups:
        config: tcp
        bind_port:    9700
        tcpping:
            initial_hosts: 10.1.112.46[9700], 10.1.112.58[9700], 10.1.112.47[9700], 10.1.112.59[9700], 10.1.112.214[9700]

What is the reason of problem?



Reply | Threaded
Open this post in threaded view
|

Re: lack of memory?

Szymon Gwóźdź
Hi!

Histo output from one of the data-nodes is in attachment

cheers

W dniu 28 kwietnia 2010 20:27 użytkownik Shay Banon <[hidden email]> napisał:
You can use http://java.sun.com/javase/6/docs/technotes/tools/share/jmap.html. First, if you can send me the histo output, maybe that would be enough... (jmap -histo:live pid).

thanks!
shay.banon

2010/4/28 Szymon Gwóźdź <[hidden email]>


2010/4/28 Shay Banon <[hidden email]>

In general, this should not happen, as memory should be cleaned correctly. This might indicate a memory leak (btw, which version are you using)?

I'm using version 0.6.0
 
It might just be that you are overloading the nodes and simply needs to allocate more memory per JVM (as you suggested, raising it to 3G).

Raising it to 3 GB will only delay the moment in which ES will "block" (I've tested it)
 

How familiar are you with java? Is there a chance that I can ask for a memory dump of the JVM so I can analyze it?

I'm not very familiar with Java, but maybe I will try to get JVM memory dump
 

cheers,
shay.banon

cheers,
Szymon

On Wed, Apr 28, 2010 at 2:46 PM, Szymon Gwóźdź <[hidden email]> wrote:
Hi!

I've lauched 5 ElasticSeach nodes (each on different host) - 2 non-data nodes and 3 data nodes. Then I've started pushing to cluster few GB of data. While doing this I've seen on htop that ES processes are using more and more memory. After some time I see in logs stuff like that:

[13:34:45,762][WARN ][jgroups.FD               ] I was suspected by ggmail-test-5-49710; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
[13:34:46,042][WARN ][jgroups.pbcast.NAKACK    ] ggmail-test-3-55691: dropped message from ggmail-test-5-49710 (not in xmit_table), keys are [ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869, ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9] [ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869, ggmail-test-3-55691, ggmail-test-2-44830]
[13:34:54,373][WARN ][jgroups.pbcast.NAKACK    ] ggmail-test-3-55691: dropped message from ggmail-test-5-49710 (not in xmit_table), keys are [ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869, ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9] [ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869, ggmail-test-3-55691, ggmail-test-2-44830]

and that:

[WARN ][monitor.jvm              ] [Winky Man] Long GC collection occurred, took [10s], breached threshold [10s]

I'm not able to put new data to ES (on existing index and type). Curl returns:

{
  "error" : "PrimaryNotStartedActionException[[mail_index1001][3] Timeout waiting for [1m]]"
}

I configured in elasticsearch.in.sh Java heap space to 1GB. Bigger values (like 3GB) changes only time after which this situations happens.

I use configuration on data nodes:

cluster:
 name:    LocalCluster060
 
node:
    data:   true

http:
    enabled:   false

discovery:
    jgroups:
        config: tcp
        bind_port:    9700
        tcpping:
            initial_hosts: 10.1.112.46[9700], 10.1.112.58[9700], 10.1.112.47[9700], 10.1.112.59[9700], 10.1.112.214[9700]

What is the reason of problem?





histo_elastic (228K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: lack of memory?

kimchy
Administrator
Hi,

  Thanks!, this certainly helps. Is there a chance that you can do a full heap dump (it will be a large file) and place it somewhere that I can download it? I appreciate the effort at helping nailing this out...

cheers,
shay.banon

2010/4/29 Szymon Gwóźdź <[hidden email]>
Hi!

Histo output from one of the data-nodes is in attachment

cheers

W dniu 28 kwietnia 2010 20:27 użytkownik Shay Banon <[hidden email]> napisał:

You can use http://java.sun.com/javase/6/docs/technotes/tools/share/jmap.html. First, if you can send me the histo output, maybe that would be enough... (jmap -histo:live pid).

thanks!
shay.banon

2010/4/28 Szymon Gwóźdź <[hidden email]>


2010/4/28 Shay Banon <[hidden email]>

In general, this should not happen, as memory should be cleaned correctly. This might indicate a memory leak (btw, which version are you using)?

I'm using version 0.6.0
 
It might just be that you are overloading the nodes and simply needs to allocate more memory per JVM (as you suggested, raising it to 3G).

Raising it to 3 GB will only delay the moment in which ES will "block" (I've tested it)
 

How familiar are you with java? Is there a chance that I can ask for a memory dump of the JVM so I can analyze it?

I'm not very familiar with Java, but maybe I will try to get JVM memory dump
 

cheers,
shay.banon

cheers,
Szymon

On Wed, Apr 28, 2010 at 2:46 PM, Szymon Gwóźdź <[hidden email]> wrote:
Hi!

I've lauched 5 ElasticSeach nodes (each on different host) - 2 non-data nodes and 3 data nodes. Then I've started pushing to cluster few GB of data. While doing this I've seen on htop that ES processes are using more and more memory. After some time I see in logs stuff like that:

[13:34:45,762][WARN ][jgroups.FD               ] I was suspected by ggmail-test-5-49710; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
[13:34:46,042][WARN ][jgroups.pbcast.NAKACK    ] ggmail-test-3-55691: dropped message from ggmail-test-5-49710 (not in xmit_table), keys are [ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869, ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9] [ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869, ggmail-test-3-55691, ggmail-test-2-44830]
[13:34:54,373][WARN ][jgroups.pbcast.NAKACK    ] ggmail-test-3-55691: dropped message from ggmail-test-5-49710 (not in xmit_table), keys are [ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869, ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9] [ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869, ggmail-test-3-55691, ggmail-test-2-44830]

and that:

[WARN ][monitor.jvm              ] [Winky Man] Long GC collection occurred, took [10s], breached threshold [10s]

I'm not able to put new data to ES (on existing index and type). Curl returns:

{
  "error" : "PrimaryNotStartedActionException[[mail_index1001][3] Timeout waiting for [1m]]"
}

I configured in elasticsearch.in.sh Java heap space to 1GB. Bigger values (like 3GB) changes only time after which this situations happens.

I use configuration on data nodes:

cluster:
 name:    LocalCluster060
 
node:
    data:   true

http:
    enabled:   false

discovery:
    jgroups:
        config: tcp
        bind_port:    9700
        tcpping:
            initial_hosts: 10.1.112.46[9700], 10.1.112.58[9700], 10.1.112.47[9700], 10.1.112.59[9700], 10.1.112.214[9700]

What is the reason of problem?





Reply | Threaded
Open this post in threaded view
|

Re: lack of memory?

kimchy
Administrator
Also, can you explain a bit about what you performed against the nodes? Did you index large documents? Performed a lot of searches or mainly indexing?

2010/4/29 Shay Banon <[hidden email]>
Hi,

  Thanks!, this certainly helps. Is there a chance that you can do a full heap dump (it will be a large file) and place it somewhere that I can download it? I appreciate the effort at helping nailing this out...

cheers,
shay.banon

2010/4/29 Szymon Gwóźdź <[hidden email]>

Hi!

Histo output from one of the data-nodes is in attachment

cheers

W dniu 28 kwietnia 2010 20:27 użytkownik Shay Banon <[hidden email]> napisał:

You can use http://java.sun.com/javase/6/docs/technotes/tools/share/jmap.html. First, if you can send me the histo output, maybe that would be enough... (jmap -histo:live pid).

thanks!
shay.banon

2010/4/28 Szymon Gwóźdź <[hidden email]>


2010/4/28 Shay Banon <[hidden email]>

In general, this should not happen, as memory should be cleaned correctly. This might indicate a memory leak (btw, which version are you using)?

I'm using version 0.6.0
 
It might just be that you are overloading the nodes and simply needs to allocate more memory per JVM (as you suggested, raising it to 3G).

Raising it to 3 GB will only delay the moment in which ES will "block" (I've tested it)
 

How familiar are you with java? Is there a chance that I can ask for a memory dump of the JVM so I can analyze it?

I'm not very familiar with Java, but maybe I will try to get JVM memory dump
 

cheers,
shay.banon

cheers,
Szymon

On Wed, Apr 28, 2010 at 2:46 PM, Szymon Gwóźdź <[hidden email]> wrote:
Hi!

I've lauched 5 ElasticSeach nodes (each on different host) - 2 non-data nodes and 3 data nodes. Then I've started pushing to cluster few GB of data. While doing this I've seen on htop that ES processes are using more and more memory. After some time I see in logs stuff like that:

[13:34:45,762][WARN ][jgroups.FD               ] I was suspected by ggmail-test-5-49710; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
[13:34:46,042][WARN ][jgroups.pbcast.NAKACK    ] ggmail-test-3-55691: dropped message from ggmail-test-5-49710 (not in xmit_table), keys are [ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869, ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9] [ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869, ggmail-test-3-55691, ggmail-test-2-44830]
[13:34:54,373][WARN ][jgroups.pbcast.NAKACK    ] ggmail-test-3-55691: dropped message from ggmail-test-5-49710 (not in xmit_table), keys are [ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869, ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9] [ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869, ggmail-test-3-55691, ggmail-test-2-44830]

and that:

[WARN ][monitor.jvm              ] [Winky Man] Long GC collection occurred, took [10s], breached threshold [10s]

I'm not able to put new data to ES (on existing index and type). Curl returns:

{
  "error" : "PrimaryNotStartedActionException[[mail_index1001][3] Timeout waiting for [1m]]"
}

I configured in elasticsearch.in.sh Java heap space to 1GB. Bigger values (like 3GB) changes only time after which this situations happens.

I use configuration on data nodes:

cluster:
 name:    LocalCluster060
 
node:
    data:   true

http:
    enabled:   false

discovery:
    jgroups:
        config: tcp
        bind_port:    9700
        tcpping:
            initial_hosts: 10.1.112.46[9700], 10.1.112.58[9700], 10.1.112.47[9700], 10.1.112.59[9700], 10.1.112.214[9700]

What is the reason of problem?






Reply | Threaded
Open this post in threaded view
|

Re: lack of memory?

kimchy
Administrator
  So, I tried to analyze as best as I can, and I might found a problem that can explain your case. It basically revolves around cleaning a search context that relates to a search request eagerly (it automatically gets cleaned after 5 minutes of inactivity by default). So, if your system executed a *lot* of search requests within that 5 minutes period, it might explain the problem. The issue is: http://github.com/elasticsearch/elasticsearch/issues/issue/153.

cheers,
shay.banon

2010/4/29 Shay Banon <[hidden email]>
Also, can you explain a bit about what you performed against the nodes? Did you index large documents? Performed a lot of searches or mainly indexing?

2010/4/29 Shay Banon <[hidden email]>

Hi,

  Thanks!, this certainly helps. Is there a chance that you can do a full heap dump (it will be a large file) and place it somewhere that I can download it? I appreciate the effort at helping nailing this out...

cheers,
shay.banon

2010/4/29 Szymon Gwóźdź <[hidden email]>

Hi!

Histo output from one of the data-nodes is in attachment

cheers

W dniu 28 kwietnia 2010 20:27 użytkownik Shay Banon <[hidden email]> napisał:

You can use http://java.sun.com/javase/6/docs/technotes/tools/share/jmap.html. First, if you can send me the histo output, maybe that would be enough... (jmap -histo:live pid).

thanks!
shay.banon

2010/4/28 Szymon Gwóźdź <[hidden email]>


2010/4/28 Shay Banon <[hidden email]>

In general, this should not happen, as memory should be cleaned correctly. This might indicate a memory leak (btw, which version are you using)?

I'm using version 0.6.0
 
It might just be that you are overloading the nodes and simply needs to allocate more memory per JVM (as you suggested, raising it to 3G).

Raising it to 3 GB will only delay the moment in which ES will "block" (I've tested it)
 

How familiar are you with java? Is there a chance that I can ask for a memory dump of the JVM so I can analyze it?

I'm not very familiar with Java, but maybe I will try to get JVM memory dump
 

cheers,
shay.banon

cheers,
Szymon

On Wed, Apr 28, 2010 at 2:46 PM, Szymon Gwóźdź <[hidden email]> wrote:
Hi!

I've lauched 5 ElasticSeach nodes (each on different host) - 2 non-data nodes and 3 data nodes. Then I've started pushing to cluster few GB of data. While doing this I've seen on htop that ES processes are using more and more memory. After some time I see in logs stuff like that:

[13:34:45,762][WARN ][jgroups.FD               ] I was suspected by ggmail-test-5-49710; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
[13:34:46,042][WARN ][jgroups.pbcast.NAKACK    ] ggmail-test-3-55691: dropped message from ggmail-test-5-49710 (not in xmit_table), keys are [ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869, ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9] [ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869, ggmail-test-3-55691, ggmail-test-2-44830]
[13:34:54,373][WARN ][jgroups.pbcast.NAKACK    ] ggmail-test-3-55691: dropped message from ggmail-test-5-49710 (not in xmit_table), keys are [ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869, ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9] [ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869, ggmail-test-3-55691, ggmail-test-2-44830]

and that:

[WARN ][monitor.jvm              ] [Winky Man] Long GC collection occurred, took [10s], breached threshold [10s]

I'm not able to put new data to ES (on existing index and type). Curl returns:

{
  "error" : "PrimaryNotStartedActionException[[mail_index1001][3] Timeout waiting for [1m]]"
}

I configured in elasticsearch.in.sh Java heap space to 1GB. Bigger values (like 3GB) changes only time after which this situations happens.

I use configuration on data nodes:

cluster:
 name:    LocalCluster060
 
node:
    data:   true

http:
    enabled:   false

discovery:
    jgroups:
        config: tcp
        bind_port:    9700
        tcpping:
            initial_hosts: 10.1.112.46[9700], 10.1.112.58[9700], 10.1.112.47[9700], 10.1.112.59[9700], 10.1.112.214[9700]

What is the reason of problem?







Reply | Threaded
Open this post in threaded view
|

Re: lack of memory?

Szymon Gwóźdź
Hi!

Thanks for looking at this problem. I could make a full dump, but not earlier than Tuesday unfortunately.
This is what I'm doing to get this problem:

[I'm putting a lot (few thounsands) of random messages to one index and one type (single message consist of few values and one text value, which has 500B-100K size (in order to highlighting, store and term_with_positions_offset on this text value is on - defined in mapping of this (index,type) )
Then I repeat this operation for next index, and next, and next, ...]
Two processes described in [ ] are launched. After 20-60 indexes (depends on memory) this problem is existing.

No search operation is launched, so it can't be issue that you suggested.

chears,
Szymon Gwóźdź

W dniu 29 kwietnia 2010 23:51 użytkownik Shay Banon <[hidden email]> napisał:
  So, I tried to analyze as best as I can, and I might found a problem that can explain your case. It basically revolves around cleaning a search context that relates to a search request eagerly (it automatically gets cleaned after 5 minutes of inactivity by default). So, if your system executed a *lot* of search requests within that 5 minutes period, it might explain the problem. The issue is: http://github.com/elasticsearch/elasticsearch/issues/issue/153.

cheers,
shay.banon

2010/4/29 Shay Banon <[hidden email]>
Also, can you explain a bit about what you performed against the nodes? Did you index large documents? Performed a lot of searches or mainly indexing?

2010/4/29 Shay Banon <[hidden email]>

Hi,

  Thanks!, this certainly helps. Is there a chance that you can do a full heap dump (it will be a large file) and place it somewhere that I can download it? I appreciate the effort at helping nailing this out...

cheers,
shay.banon

2010/4/29 Szymon Gwóźdź <[hidden email]>

Hi!

Histo output from one of the data-nodes is in attachment

cheers

W dniu 28 kwietnia 2010 20:27 użytkownik Shay Banon <[hidden email]> napisał:

You can use http://java.sun.com/javase/6/docs/technotes/tools/share/jmap.html. First, if you can send me the histo output, maybe that would be enough... (jmap -histo:live pid).

thanks!
shay.banon

2010/4/28 Szymon Gwóźdź <[hidden email]>


2010/4/28 Shay Banon <[hidden email]>

In general, this should not happen, as memory should be cleaned correctly. This might indicate a memory leak (btw, which version are you using)?

I'm using version 0.6.0
 
It might just be that you are overloading the nodes and simply needs to allocate more memory per JVM (as you suggested, raising it to 3G).

Raising it to 3 GB will only delay the moment in which ES will "block" (I've tested it)
 

How familiar are you with java? Is there a chance that I can ask for a memory dump of the JVM so I can analyze it?

I'm not very familiar with Java, but maybe I will try to get JVM memory dump
 

cheers,
shay.banon

cheers,
Szymon

On Wed, Apr 28, 2010 at 2:46 PM, Szymon Gwóźdź <[hidden email]> wrote:
Hi!

I've lauched 5 ElasticSeach nodes (each on different host) - 2 non-data nodes and 3 data nodes. Then I've started pushing to cluster few GB of data. While doing this I've seen on htop that ES processes are using more and more memory. After some time I see in logs stuff like that:

[13:34:45,762][WARN ][jgroups.FD               ] I was suspected by ggmail-test-5-49710; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
[13:34:46,042][WARN ][jgroups.pbcast.NAKACK    ] ggmail-test-3-55691: dropped message from ggmail-test-5-49710 (not in xmit_table), keys are [ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869, ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9] [ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869, ggmail-test-3-55691, ggmail-test-2-44830]
[13:34:54,373][WARN ][jgroups.pbcast.NAKACK    ] ggmail-test-3-55691: dropped message from ggmail-test-5-49710 (not in xmit_table), keys are [ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869, ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9] [ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869, ggmail-test-3-55691, ggmail-test-2-44830]

and that:

[WARN ][monitor.jvm              ] [Winky Man] Long GC collection occurred, took [10s], breached threshold [10s]

I'm not able to put new data to ES (on existing index and type). Curl returns:

{
  "error" : "PrimaryNotStartedActionException[[mail_index1001][3] Timeout waiting for [1m]]"
}

I configured in elasticsearch.in.sh Java heap space to 1GB. Bigger values (like 3GB) changes only time after which this situations happens.

I use configuration on data nodes:

cluster:
 name:    LocalCluster060
 
node:
    data:   true

http:
    enabled:   false

discovery:
    jgroups:
        config: tcp
        bind_port:    9700
        tcpping:
            initial_hosts: 10.1.112.46[9700], 10.1.112.58[9700], 10.1.112.47[9700], 10.1.112.59[9700], 10.1.112.214[9700]

What is the reason of problem?








Reply | Threaded
Open this post in threaded view
|

Re: lack of memory?

kimchy
Administrator
I see. Is there a chance that you can send the program you use and I will run it?

cheers,
shay.banon

2010/4/30 Szymon Gwóźdź <[hidden email]>
Hi!

Thanks for looking at this problem. I could make a full dump, but not earlier than Tuesday unfortunately.
This is what I'm doing to get this problem:

[I'm putting a lot (few thounsands) of random messages to one index and one type (single message consist of few values and one text value, which has 500B-100K size (in order to highlighting, store and term_with_positions_offset on this text value is on - defined in mapping of this (index,type) )
Then I repeat this operation for next index, and next, and next, ...]
Two processes described in [ ] are launched. After 20-60 indexes (depends on memory) this problem is existing.

No search operation is launched, so it can't be issue that you suggested.

chears,
Szymon Gwóźdź

W dniu 29 kwietnia 2010 23:51 użytkownik Shay Banon <[hidden email]> napisał:

  So, I tried to analyze as best as I can, and I might found a problem that can explain your case. It basically revolves around cleaning a search context that relates to a search request eagerly (it automatically gets cleaned after 5 minutes of inactivity by default). So, if your system executed a *lot* of search requests within that 5 minutes period, it might explain the problem. The issue is: http://github.com/elasticsearch/elasticsearch/issues/issue/153.

cheers,
shay.banon

2010/4/29 Shay Banon <[hidden email]>
Also, can you explain a bit about what you performed against the nodes? Did you index large documents? Performed a lot of searches or mainly indexing?

2010/4/29 Shay Banon <[hidden email]>

Hi,

  Thanks!, this certainly helps. Is there a chance that you can do a full heap dump (it will be a large file) and place it somewhere that I can download it? I appreciate the effort at helping nailing this out...

cheers,
shay.banon

2010/4/29 Szymon Gwóźdź <[hidden email]>

Hi!

Histo output from one of the data-nodes is in attachment

cheers

W dniu 28 kwietnia 2010 20:27 użytkownik Shay Banon <[hidden email]> napisał:

You can use http://java.sun.com/javase/6/docs/technotes/tools/share/jmap.html. First, if you can send me the histo output, maybe that would be enough... (jmap -histo:live pid).

thanks!
shay.banon

2010/4/28 Szymon Gwóźdź <[hidden email]>


2010/4/28 Shay Banon <[hidden email]>

In general, this should not happen, as memory should be cleaned correctly. This might indicate a memory leak (btw, which version are you using)?

I'm using version 0.6.0
 
It might just be that you are overloading the nodes and simply needs to allocate more memory per JVM (as you suggested, raising it to 3G).

Raising it to 3 GB will only delay the moment in which ES will "block" (I've tested it)
 

How familiar are you with java? Is there a chance that I can ask for a memory dump of the JVM so I can analyze it?

I'm not very familiar with Java, but maybe I will try to get JVM memory dump
 

cheers,
shay.banon

cheers,
Szymon

On Wed, Apr 28, 2010 at 2:46 PM, Szymon Gwóźdź <[hidden email]> wrote:
Hi!

I've lauched 5 ElasticSeach nodes (each on different host) - 2 non-data nodes and 3 data nodes. Then I've started pushing to cluster few GB of data. While doing this I've seen on htop that ES processes are using more and more memory. After some time I see in logs stuff like that:

[13:34:45,762][WARN ][jgroups.FD               ] I was suspected by ggmail-test-5-49710; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
[13:34:46,042][WARN ][jgroups.pbcast.NAKACK    ] ggmail-test-3-55691: dropped message from ggmail-test-5-49710 (not in xmit_table), keys are [ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869, ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9] [ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869, ggmail-test-3-55691, ggmail-test-2-44830]
[13:34:54,373][WARN ][jgroups.pbcast.NAKACK    ] ggmail-test-3-55691: dropped message from ggmail-test-5-49710 (not in xmit_table), keys are [ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869, ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9] [ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869, ggmail-test-3-55691, ggmail-test-2-44830]

and that:

[WARN ][monitor.jvm              ] [Winky Man] Long GC collection occurred, took [10s], breached threshold [10s]

I'm not able to put new data to ES (on existing index and type). Curl returns:

{
  "error" : "PrimaryNotStartedActionException[[mail_index1001][3] Timeout waiting for [1m]]"
}

I configured in elasticsearch.in.sh Java heap space to 1GB. Bigger values (like 3GB) changes only time after which this situations happens.

I use configuration on data nodes:

cluster:
 name:    LocalCluster060
 
node:
    data:   true

http:
    enabled:   false

discovery:
    jgroups:
        config: tcp
        bind_port:    9700
        tcpping:
            initial_hosts: 10.1.112.46[9700], 10.1.112.58[9700], 10.1.112.47[9700], 10.1.112.59[9700], 10.1.112.214[9700]

What is the reason of problem?









Reply | Threaded
Open this post in threaded view
|

Re: lack of memory?

kimchy
Administrator
By the way, another possible reason for this is that simply creating so many indices (each with 5 shards and replica each) is too much on only 3 nodes. Each shard instance is a Lucene index, and it does takes its tall on memory usage. To verify this, you can try and change your test to use types instead of different indices, and use a single index, see if you get problems then.

cheers,
shay.banon

2010/4/30 Shay Banon <[hidden email]>
I see. Is there a chance that you can send the program you use and I will run it?

cheers,
shay.banon

2010/4/30 Szymon Gwóźdź <[hidden email]>

Hi!

Thanks for looking at this problem. I could make a full dump, but not earlier than Tuesday unfortunately.
This is what I'm doing to get this problem:

[I'm putting a lot (few thounsands) of random messages to one index and one type (single message consist of few values and one text value, which has 500B-100K size (in order to highlighting, store and term_with_positions_offset on this text value is on - defined in mapping of this (index,type) )
Then I repeat this operation for next index, and next, and next, ...]
Two processes described in [ ] are launched. After 20-60 indexes (depends on memory) this problem is existing.

No search operation is launched, so it can't be issue that you suggested.

chears,
Szymon Gwóźdź

W dniu 29 kwietnia 2010 23:51 użytkownik Shay Banon <[hidden email]> napisał:

  So, I tried to analyze as best as I can, and I might found a problem that can explain your case. It basically revolves around cleaning a search context that relates to a search request eagerly (it automatically gets cleaned after 5 minutes of inactivity by default). So, if your system executed a *lot* of search requests within that 5 minutes period, it might explain the problem. The issue is: http://github.com/elasticsearch/elasticsearch/issues/issue/153.

cheers,
shay.banon

2010/4/29 Shay Banon <[hidden email]>
Also, can you explain a bit about what you performed against the nodes? Did you index large documents? Performed a lot of searches or mainly indexing?

2010/4/29 Shay Banon <[hidden email]>

Hi,

  Thanks!, this certainly helps. Is there a chance that you can do a full heap dump (it will be a large file) and place it somewhere that I can download it? I appreciate the effort at helping nailing this out...

cheers,
shay.banon

2010/4/29 Szymon Gwóźdź <[hidden email]>

Hi!

Histo output from one of the data-nodes is in attachment

cheers

W dniu 28 kwietnia 2010 20:27 użytkownik Shay Banon <[hidden email]> napisał:

You can use http://java.sun.com/javase/6/docs/technotes/tools/share/jmap.html. First, if you can send me the histo output, maybe that would be enough... (jmap -histo:live pid).

thanks!
shay.banon

2010/4/28 Szymon Gwóźdź <[hidden email]>


2010/4/28 Shay Banon <[hidden email]>

In general, this should not happen, as memory should be cleaned correctly. This might indicate a memory leak (btw, which version are you using)?

I'm using version 0.6.0
 
It might just be that you are overloading the nodes and simply needs to allocate more memory per JVM (as you suggested, raising it to 3G).

Raising it to 3 GB will only delay the moment in which ES will "block" (I've tested it)
 

How familiar are you with java? Is there a chance that I can ask for a memory dump of the JVM so I can analyze it?

I'm not very familiar with Java, but maybe I will try to get JVM memory dump
 

cheers,
shay.banon

cheers,
Szymon

On Wed, Apr 28, 2010 at 2:46 PM, Szymon Gwóźdź <[hidden email]> wrote:
Hi!

I've lauched 5 ElasticSeach nodes (each on different host) - 2 non-data nodes and 3 data nodes. Then I've started pushing to cluster few GB of data. While doing this I've seen on htop that ES processes are using more and more memory. After some time I see in logs stuff like that:

[13:34:45,762][WARN ][jgroups.FD               ] I was suspected by ggmail-test-5-49710; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
[13:34:46,042][WARN ][jgroups.pbcast.NAKACK    ] ggmail-test-3-55691: dropped message from ggmail-test-5-49710 (not in xmit_table), keys are [ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869, ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9] [ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869, ggmail-test-3-55691, ggmail-test-2-44830]
[13:34:54,373][WARN ][jgroups.pbcast.NAKACK    ] ggmail-test-3-55691: dropped message from ggmail-test-5-49710 (not in xmit_table), keys are [ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869, ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9] [ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869, ggmail-test-3-55691, ggmail-test-2-44830]

and that:

[WARN ][monitor.jvm              ] [Winky Man] Long GC collection occurred, took [10s], breached threshold [10s]

I'm not able to put new data to ES (on existing index and type). Curl returns:

{
  "error" : "PrimaryNotStartedActionException[[mail_index1001][3] Timeout waiting for [1m]]"
}

I configured in elasticsearch.in.sh Java heap space to 1GB. Bigger values (like 3GB) changes only time after which this situations happens.

I use configuration on data nodes:

cluster:
 name:    LocalCluster060
 
node:
    data:   true

http:
    enabled:   false

discovery:
    jgroups:
        config: tcp
        bind_port:    9700
        tcpping:
            initial_hosts: 10.1.112.46[9700], 10.1.112.58[9700], 10.1.112.47[9700], 10.1.112.59[9700], 10.1.112.214[9700]

What is the reason of problem?