Indexes seem corrupted

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Indexes seem corrupted

John Chang
We are worried are indexes are corrupted for a number of reasons.  We are looking through the logs to see what might have happened, but are still without a grasp on it.  Any advice on understanding, trouble-shooting, and preventing what we are seeing would be greatly appreciated.  Thanks.

1) We keep 4 document types; they all used to have desired mappings, now 2 of the 4 seem to be missing the mappings.  Our system maps all 4 types at once and we are confident those mappings used to be there for all types.

2) We lost a lot of documents; we do a count, and there are fraction remaining of what used to be there.

3) We are getting an error we've never seen before (see below).  The document type in question here does still seem to have the correct mappings.

[Failed to execute main query]]; nested: CompileException[[Error: Invalid shift value in prefixCoded string (is encoded value really an INT?)]\n[Near : {... Unknown ....}]\n             ^\n[Line: 1, Column: 0]]; nested: NumberFormatException[Invalid shift value in prefixCoded string (is encoded value really an INT?)]; }{[6fe786b4-de13-451c-8296-7803b8bbe1d8][index0][2]: RemoteTransportException[[Angel][inet[\/10.198.109.171:9300]][search\/phase\/query]]; nested: QueryPhaseExecutionException[[index0][2]: query[custom score (+userId:4c6b25774f8bd5147ab46cf4 +(body:\"john smith\" subject:\"john smith\" to:\"john smith\" from:\"john smith\" cc:\john smith\"),function=org.elasticsearch.index.query.xcontent.CustomScoreQueryParser$ScriptScoreFunction@7daf32e3)],from[0],size[100]: Query Failed
Reply | Threaded
Open this post in threaded view
|

Re: Indexes seem corrupted

John Chang
I should add that the index was created on Elastic Search 0.11 and we upgraded to 0.12.1, without reindexing (which we understood to be not necessary as we are not doing geo searches).  We tested it after the upgrade and it seemed fine then; not sure when it went off the rails.

Not expecting this has to do with the upgrade, but just wanted to call it out just in case it was useful info.
Reply | Threaded
Open this post in threaded view
|

Re: Indexes seem corrupted

Clinton Gormley
Hi John

On Wed, 2010-11-17 at 09:43 -0800, John Chang wrote:
> I should add that the index was created on Elastic Search 0.11 and we
> upgraded to 0.12.1, without reindexing (which we understood to be not
> necessary as we are not doing geo searches).  We tested it after the upgrade
> and it seemed fine then; not sure when it went off the rails.
>
> Not expecting this has to do with the upgrade, but just wanted to call it
> out just in case it was useful info.

This does sound like your indexed have been corrupted somewhere along
the way. You may have been hit by this bug:
http://github.com/elasticsearch/elasticsearch/issues/issue/466
Although I'm not sure if that would result in you losing mappings.

Would be worth gist'ing your logs: https://gist.github.com/

clint


Reply | Threaded
Open this post in threaded view
|

Re: Indexes seem corrupted

John Chang
Here is a gist of the elastic search logs.  However, I don't know if they will useful; they just log some activity about 2 hours before I started seeing the problems noted above in my application logs, and they seem pretty tame:
https://gist.github.com/703964


Here is some more info from my application log.  It is basically more of what I put in the original post:
https://gist.github.com/704009

I don't know if this is useful, but I can't think of anything more to post.  Let me know if there's something else that I'm missing.
Reply | Threaded
Open this post in threaded view
|

Re: Indexes seem corrupted

kimchy
Administrator
It might relate to the possible corruption that might happen that was fixed in master (upcoming 0.13). I also fixed a possible race condition between the recovery of an index and the creation of its mappings and an index operation getting in between the two (the new full cluster and index level blocks). It sounds like you might have hit both of them... . I assume you use local gateway?

-shay.banon

On Wed, Nov 17, 2010 at 10:23 PM, John Chang <[hidden email]> wrote:

Here is a gist of the elastic search logs.  However, I don't know if they
will useful; they just log some activity about 2 hours before I started
seeing the problems noted above in my application logs, and they seem pretty
tame:
https://gist.github.com/703964


Here is some more info from my application log.  It is basically more of
what I put in the original post:
https://gist.github.com/704009

I don't know if this is useful, but I can't think of anything more to post.
Let me know if there's something else that I'm missing.

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Indexes-seem-corrupted-tp1918553p1919499.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Indexes seem corrupted

John Chang
I think that's the problem.  Yes, we are using local search.  Also, what you (kimchy) write makes sense, as the Elastic Search data node logs here https://gist.github.com/703964 show initialization at times that correspond perfectly to when the searches started going bad in our application log (which uses the no-data nodes).

The only thing I wonder is...why did the Elastic Search data nodes decide to reinitialize at that time; we did restart the data node cluster, but that was over 2 hours before this initialization in those logs.  What kicks off the initialization other than a service restart?
Reply | Threaded
Open this post in threaded view
|

Re: Indexes seem corrupted

kimchy
Administrator
It seems like the network connection got completely broken between the nodes (you see the transport disconnect reason for nodes being identified as failed).

You can try and set: discovery.zen.fd.connect_on_network_disconnect to true, which in such event will try and connect again to the node in question to make sure it can't be connected.

-shay.banon

On Thu, Nov 18, 2010 at 12:29 AM, John Chang <[hidden email]> wrote:

I think that's the problem.  Yes, we are using local search.  Also, what you
(kimchy) write makes sense, as the Elastic Search data node logs here
https://gist.github.com/703964 show initialization at times that correspond
perfectly to when the searches started going bad in our application log
(which uses the no-data nodes).

The only thing I wonder is...why did the Elastic Search data nodes decide to
reinitialize at that time; we did restart the data node cluster, but that
was over 2 hours before this initialization in those logs.  What kicks off
the initialization other than a service restart?
--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Indexes-seem-corrupted-tp1918553p1920227.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.