Indices are missing. Help!

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Indices are missing. Help!

Matthew Eash
I have a 3 node ES 1.4.1 cluster that runs on CentOS6, Oracle JDK 1.7.0_67.   Heap was set to 20G of the 32G on those boxes, with mlockall set.   Configuration is currently set more towards bulk loading more than it is searching.  Purpose of the ES cluster was for time-series indexing of logged metrics.  I originally had one larger index (1.1B docs) tracking a high frequency metric over the past several months, but recently changed schema design to do an index per-day.  I was loading additional metrics as well as reimporting the data in that larger index into per-day.   ES search usage is very light at the moment.

Last night, I had finished a multi-day bulk import of several months worth of multiple log metrics into per-day indices. The per-day indices were all either 12M or 48M records with settings of {shards=4, replication=0, refresh_interval=-1} while I bulk loaded.  After a day was fully loaded in bulk and no more writes necessary, I was optimizing each to 1 segment (taking 30-45s), then ultimately was going to set {replication=1,refresh_interval=30s} once all were individually optimized.  As of last night, I was about 1/4 of the way through optimizing, and none of them (beyond the larger index) were replicated.

After bulk import was done, I was poking around ES API, not really doing anything extraordinary (some searches, some optimization/merges of individual per-day indexes that I had done even while bulk importing).  At that time, some event ultimately spun out 2 of the nodes, making them inaccessible.  I'm still trying to diagnose what exactly occurred - this not the first occurrence of this mystery spin out of a node, but never had 2 go at once. I believe the JVM is locking up the kernel some how.  I could ping them, but could not access the machines in any way.  Through the night - it seems the inaccessible machines occasionally attempted to reestablish the cluster only to disappear again.  The remaining node just flailed, attempting to establish master most of the time.   

This morning, I had to have the machines physically rebooted at the console, as they were still unresponsive.

So - I'm still trying to diagnose what exactly went wrong.  I do recall seeing the heap size on all the nodes start growing to about double the 20G I had assigned - but am unsure if that caused whatever freeze up occurred.   (Would love to know where to start looking.)

However, my more immediate issue....  when the cluster came back up after reboot, only 1 index is showing, my original 1.1B-doc larger, replicated index.   All of my daily per-day indexes created over the past 2 weeks are completely missing in ES.     /_cat/indices yesterday, showed 276 happy green indexes, today it shows only 1.  After looking at the raw data directories (split across 2 volumes on local spinning disks), it's all still there... all index directories exist and within them I see all the raw Lucene shard dirs and segment files.

Since the cluster reboot, only this stands out in the logs, from the master node:
[2015-01-07 12:22:14,348][INFO ][gateway                  ] [node3] recovered [1] indices into cluster_state
[2015-01-07 12:22:14,440][INFO ][indices.store            ] [node3] Failed to open / find files while reading metadata snapshot

Subsequent reboots only show 1 indices recovered and don't have the metadata failure message.

Is there any way to fix the index metadata to reestablish the indices that were all there yesterday, and still exist on the disk?  How do I go about cleaning this up?  I am finding nothing in ES documentation talking about internal index metadata (where it's stored, how to fix corruption, or anything about this error message).

I want to root cause the node failures that occurred - but that is likely a deep issue that will take a while to research/diagnose.   My more immediate need is getting those indexes back first!  Any attempt to see or deal with those indices now gets an IndexMissingException.   

My only clue in why this occurred thus far is that one of the failing nodes kept trying to reestablish a 2-node cluster with itself as master through the night with the lone working node, then kept failing and dropping the other node from cluster.  During that time and after the new master found itself alone, this appeared in log for many of the per-day indexes:
[2015-01-07 00:05:20,254][DEBUG][action.admin.indices.stats] [node1] [temp-2014-11-14][3], node[fwGNfUZJTmmkAj4hpCobWg], [P], s[STARTED]: failed to execute [org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@7b542938]
org.elasticsearch.transport.NodeDisconnectedException: [node3][inet[/172.16.0.34:9300]][indices:monitor/stats[s]] disconnected

This occurred again 2 hours later. Would the master then expel the index after stat request failures?

Any assistance would be greatly appreciated!  Cluster is behaving fine at the moment now that nodes were rebooted, just is missing 275 indexes that are there...
 


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e0477411-bf07-4e6c-9d56-4db81a4d6798%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Indices are missing. Help!

Matthew Eash
As a followup - seems there is a major issue in indices created from a mapping template.   I installed a fresh copy of ES 1.4.2 standalone on my laptop and replicated the issue I had on my ES 1.4.1 cluster -- disappearing indices on cluster restart.

https://github.com/elasticsearch/elasticsearch/issues/9223

Would love some insight from ES devs on if it's possible to get the "disappeared" indices back and visible in ES.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3658ec01-7901-4054-942e-52509d77bafe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Indices are missing. Help!

vachan_da
This post has NOT been accepted by the mailing list yet.
Hey I know this is an old post and the issue might be resolved and all, but even I'm facing this issue. In my case after the cluster restart the all the indexes are showing up but it shows that it has 0 docs. Can you please help me out, with how you resolved the issue or any insights into it?

I'm currently using ES-1.42. and the indexes were created dynamically by logstash.