Indexing ElasticSearch with Hadoop (LUCENE_36)

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Indexing ElasticSearch with Hadoop (LUCENE_36)

Davide Palmisano
Dear all,

I need your help. Here's an high level description of what I'm trying to do, followed by some dives into details.

I'm running an Hadoop Job with basically indexes bulks of documents to an ElasticSearch two nodes cluster.
Some details on the configuration follows:

- hadoop version, running on Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
- elasticsearch version 0.90.3 running on Java(TM) SE Runtime Environment (build 1.6.0_45-b06)

Using, a TransportClient, each Reducer task tries to connect to the cluster as shown here[https://gist.github.com/dpalmisano/6251563], line 120.
Unfortunately, when the reducer tasks try to connect it permanently fails due this exception:

2013/08/16 16:48:52,604 [main] FATAL org.apache.hadoop.mapred.Child - Error running child : java.lang.NoSuchFieldError: LUCENE_36
at org.elasticsearch.Version.<clinit>(Version.java:42)
at org.elasticsearch.client.transport.TransportClient.<init>(TransportClient.java:165)
at org.elasticsearch.client.transport.TransportClient.<init>(TransportClient.java:121)
at com.dpalmisano.mapred.es.ElasticSearchBulkOutputFormat$ElasticSearchBulkRecordWriter.start_embedded_client(ElasticSearchBulkOutputFormat.java:151)
at com.dpalmisano.mapred.es.ElasticSearchBulkOutputFormat$ElasticSearchBulkRecordWriter.<init>(ElasticSearchBulkOutputFormat.java:67)
at com.dpalmisano.mapred.es.ElasticSearchBulkOutputFormat.getRecordWriter(ElasticSearchBulkOutputFormat.java:160)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:583)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:652)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:426)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

Digging it around, it seems it's mainly related to an old version of Lucene in the classpath. My Hadoop Job dependency tree is totally ok (it uses only lucene deps coming from elasticsearch 0.90.3), while Hadoop uses and older version (2.9.4).

Does anybody ever met this problem before? I'll really appreciate any hint or help from you guys.

thanks in advance

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Indexing ElasticSearch with Hadoop (LUCENE_36)

Costin Leau
The older lucene jar (the one from Hadoop) will be picked up since Hadoop starts up first and by the time your job gets executed, Lucene is already loaded.
You could try replacing the lucene jar in Hadoop (remember you have to do that across your entire cluster).

Speaking of which, have you looked at elasticsearch-hadoop?


On Fri, Aug 16, 2013 at 7:59 PM, Davide Palmisano <[hidden email]> wrote:
Dear all,

I need your help. Here's an high level description of what I'm trying to do, followed by some dives into details.

I'm running an Hadoop Job with basically indexes bulks of documents to an ElasticSearch two nodes cluster.
Some details on the configuration follows:

- hadoop version, running on Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
- elasticsearch version 0.90.3 running on Java(TM) SE Runtime Environment (build 1.6.0_45-b06)

Using, a TransportClient, each Reducer task tries to connect to the cluster as shown here[https://gist.github.com/dpalmisano/6251563], line 120.
Unfortunately, when the reducer tasks try to connect it permanently fails due this exception:

2013/08/16 16:48:52,604 [main] FATAL org.apache.hadoop.mapred.Child - Error running child : java.lang.NoSuchFieldError: LUCENE_36
at org.elasticsearch.Version.<clinit>(Version.java:42)
at org.elasticsearch.client.transport.TransportClient.<init>(TransportClient.java:165)
at org.elasticsearch.client.transport.TransportClient.<init>(TransportClient.java:121)
at com.dpalmisano.mapred.es.ElasticSearchBulkOutputFormat$ElasticSearchBulkRecordWriter.start_embedded_client(ElasticSearchBulkOutputFormat.java:151)
at com.dpalmisano.mapred.es.ElasticSearchBulkOutputFormat$ElasticSearchBulkRecordWriter.<init>(ElasticSearchBulkOutputFormat.java:67)
at com.dpalmisano.mapred.es.ElasticSearchBulkOutputFormat.getRecordWriter(ElasticSearchBulkOutputFormat.java:160)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:583)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:652)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:426)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

Digging it around, it seems it's mainly related to an old version of Lucene in the classpath. My Hadoop Job dependency tree is totally ok (it uses only lucene deps coming from elasticsearch 0.90.3), while Hadoop uses and older version (2.9.4).

Does anybody ever met this problem before? I'll really appreciate any hint or help from you guys.

thanks in advance

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Indexing ElasticSearch with Hadoop (LUCENE_36)

Davide Palmisano
Thanks Costin,

I really appreciate your prompt response, unfortunately it's not feasibile to replace all the libs along the whole cluster (which is pretty huge)
 and due to the major version changes in Lucene, I doubt it will work out fine.

I was looking to elasticsearch-hadoop but I have a question:

1) how does the mapping between what the reducer writes and elasticsearch works? Are Reducer keys mapped to ES ids?
My reducers are writing a key which should be mapped to an ES document id and a json document which should be indexed as the proper document into ES.

do you think that elasticsearch-hadoop could help in this scenario?

all the best, 

On Friday, August 16, 2013 6:09:12 PM UTC+1, Costin Leau wrote:
The older lucene jar (the one from Hadoop) will be picked up since Hadoop starts up first and by the time your job gets executed, Lucene is already loaded.
You could try replacing the lucene jar in Hadoop (remember you have to do that across your entire cluster).

Speaking of which, have you looked at elasticsearch-hadoop?


On Fri, Aug 16, 2013 at 7:59 PM, Davide Palmisano <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="UNJJwNmZpOkJ">dpalm...@...> wrote:
Dear all,

I need your help. Here's an high level description of what I'm trying to do, followed by some dives into details.

I'm running an Hadoop Job with basically indexes bulks of documents to an ElasticSearch two nodes cluster.
Some details on the configuration follows:

- hadoop version, running on Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
- elasticsearch version 0.90.3 running on Java(TM) SE Runtime Environment (build 1.6.0_45-b06)

Using, a TransportClient, each Reducer task tries to connect to the cluster as shown here[https://gist.github.com/dpalmisano/6251563], line 120.
Unfortunately, when the reducer tasks try to connect it permanently fails due this exception:

2013/08/16 16:48:52,604 [main] FATAL org.apache.hadoop.mapred.Child - Error running child : java.lang.NoSuchFieldError: LUCENE_36
at org.elasticsearch.Version.<clinit>(Version.java:42)
at org.elasticsearch.client.transport.TransportClient.<init>(TransportClient.java:165)
at org.elasticsearch.client.transport.TransportClient.<init>(TransportClient.java:121)
at com.dpalmisano.mapred.es.ElasticSearchBulkOutputFormat$ElasticSearchBulkRecordWriter.start_embedded_client(ElasticSearchBulkOutputFormat.java:151)
at com.dpalmisano.mapred.es.ElasticSearchBulkOutputFormat$ElasticSearchBulkRecordWriter.<init>(ElasticSearchBulkOutputFormat.java:67)
at com.dpalmisano.mapred.es.ElasticSearchBulkOutputFormat.getRecordWriter(ElasticSearchBulkOutputFormat.java:160)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:583)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:652)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:426)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

Digging it around, it seems it's mainly related to an old version of Lucene in the classpath. My Hadoop Job dependency tree is totally ok (it uses only lucene deps coming from elasticsearch 0.90.3), while Hadoop uses and older version (2.9.4).

Does anybody ever met this problem before? I'll really appreciate any hint or help from you guys.

thanks in advance

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="UNJJwNmZpOkJ">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Indexing ElasticSearch with Hadoop (LUCENE_36)

Costin Leau
The upcoming beta (ETA next week) doesn't have a concept of id - the id is generated by elasticsearch, however one can use id path to tell ES where to pick the id.
It looks like you're using plain M/R - in that case, one can pass a Map<Writable> that represents the JSON document. that's because in most cases, the data is read in M/R, Pig, Hive in native types - once the results are in, we handle the JSON conversion and HTTP communication (so the users doesn't have to handle it).

However, we do plan to allow json documents to be passed as is without having to be converted to Writable objects. As an intermediary work around, you could just load the JSON back as Map<Writable> (see the WritableUtils in ES-Hadoop) or, if you're generating JSON from Writable, pass those directly to es-hadoop.

Hope this helps,



On Fri, Aug 16, 2013 at 8:25 PM, Davide Palmisano <[hidden email]> wrote:
Thanks Costin,

I really appreciate your prompt response, unfortunately it's not feasibile to replace all the libs along the whole cluster (which is pretty huge)
 and due to the major version changes in Lucene, I doubt it will work out fine.

I was looking to elasticsearch-hadoop but I have a question:

1) how does the mapping between what the reducer writes and elasticsearch works? Are Reducer keys mapped to ES ids?
My reducers are writing a key which should be mapped to an ES document id and a json document which should be indexed as the proper document into ES.

do you think that elasticsearch-hadoop could help in this scenario?

all the best, 

On Friday, August 16, 2013 6:09:12 PM UTC+1, Costin Leau wrote:
The older lucene jar (the one from Hadoop) will be picked up since Hadoop starts up first and by the time your job gets executed, Lucene is already loaded.
You could try replacing the lucene jar in Hadoop (remember you have to do that across your entire cluster).

Speaking of which, have you looked at elasticsearch-hadoop?


On Fri, Aug 16, 2013 at 7:59 PM, Davide Palmisano <[hidden email]> wrote:
Dear all,

I need your help. Here's an high level description of what I'm trying to do, followed by some dives into details.

I'm running an Hadoop Job with basically indexes bulks of documents to an ElasticSearch two nodes cluster.
Some details on the configuration follows:

- hadoop version, running on Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
- elasticsearch version 0.90.3 running on Java(TM) SE Runtime Environment (build 1.6.0_45-b06)

Using, a TransportClient, each Reducer task tries to connect to the cluster as shown here[https://gist.github.com/dpalmisano/6251563], line 120.
Unfortunately, when the reducer tasks try to connect it permanently fails due this exception:

2013/08/16 16:48:52,604 [main] FATAL org.apache.hadoop.mapred.Child - Error running child : java.lang.NoSuchFieldError: LUCENE_36
at org.elasticsearch.Version.<clinit>(Version.java:42)
at org.elasticsearch.client.transport.TransportClient.<init>(TransportClient.java:165)
at org.elasticsearch.client.transport.TransportClient.<init>(TransportClient.java:121)
at com.dpalmisano.mapred.es.ElasticSearchBulkOutputFormat$ElasticSearchBulkRecordWriter.start_embedded_client(ElasticSearchBulkOutputFormat.java:151)
at com.dpalmisano.mapred.es.ElasticSearchBulkOutputFormat$ElasticSearchBulkRecordWriter.<init>(ElasticSearchBulkOutputFormat.java:67)
at com.dpalmisano.mapred.es.ElasticSearchBulkOutputFormat.getRecordWriter(ElasticSearchBulkOutputFormat.java:160)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:583)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:652)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:426)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

Digging it around, it seems it's mainly related to an old version of Lucene in the classpath. My Hadoop Job dependency tree is totally ok (it uses only lucene deps coming from elasticsearch 0.90.3), while Hadoop uses and older version (2.9.4).

Does anybody ever met this problem before? I'll really appreciate any hint or help from you guys.

thanks in advance

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Indexing ElasticSearch with Hadoop (LUCENE_36)

Michael Sick
For a quick-dirty, I've had M-R jobs create bulk index statement and just used Curl or the Jersey REST client to post them. 

Michael Sick | Big Data Architect
Serene Software Inc.
919-523-4447 (cell) | [hidden email] | www.serenesoftware.com
Core: ElasticSearch | HBase | Hadoop | RedShift/ParAccel | Hive



On Fri, Aug 16, 2013 at 2:33 PM, Costin Leau <[hidden email]> wrote:
The upcoming beta (ETA next week) doesn't have a concept of id - the id is generated by elasticsearch, however one can use id path to tell ES where to pick the id.
It looks like you're using plain M/R - in that case, one can pass a Map<Writable> that represents the JSON document. that's because in most cases, the data is read in M/R, Pig, Hive in native types - once the results are in, we handle the JSON conversion and HTTP communication (so the users doesn't have to handle it).

However, we do plan to allow json documents to be passed as is without having to be converted to Writable objects. As an intermediary work around, you could just load the JSON back as Map<Writable> (see the WritableUtils in ES-Hadoop) or, if you're generating JSON from Writable, pass those directly to es-hadoop.

Hope this helps,



On Fri, Aug 16, 2013 at 8:25 PM, Davide Palmisano <[hidden email]> wrote:
Thanks Costin,

I really appreciate your prompt response, unfortunately it's not feasibile to replace all the libs along the whole cluster (which is pretty huge)
 and due to the major version changes in Lucene, I doubt it will work out fine.

I was looking to elasticsearch-hadoop but I have a question:

1) how does the mapping between what the reducer writes and elasticsearch works? Are Reducer keys mapped to ES ids?
My reducers are writing a key which should be mapped to an ES document id and a json document which should be indexed as the proper document into ES.

do you think that elasticsearch-hadoop could help in this scenario?

all the best, 

On Friday, August 16, 2013 6:09:12 PM UTC+1, Costin Leau wrote:
The older lucene jar (the one from Hadoop) will be picked up since Hadoop starts up first and by the time your job gets executed, Lucene is already loaded.
You could try replacing the lucene jar in Hadoop (remember you have to do that across your entire cluster).

Speaking of which, have you looked at elasticsearch-hadoop?


On Fri, Aug 16, 2013 at 7:59 PM, Davide Palmisano <[hidden email]> wrote:
Dear all,

I need your help. Here's an high level description of what I'm trying to do, followed by some dives into details.

I'm running an Hadoop Job with basically indexes bulks of documents to an ElasticSearch two nodes cluster.
Some details on the configuration follows:

- hadoop version, running on Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
- elasticsearch version 0.90.3 running on Java(TM) SE Runtime Environment (build 1.6.0_45-b06)

Using, a TransportClient, each Reducer task tries to connect to the cluster as shown here[https://gist.github.com/dpalmisano/6251563], line 120.
Unfortunately, when the reducer tasks try to connect it permanently fails due this exception:


2013/08/16 16:48:52,604 [main] FATAL org.apache.hadoop.mapred.Child - Error running child : java.lang.NoSuchFieldError: LUCENE_36
at org.elasticsearch.Version.<clinit>(Version.java:42)
at org.elasticsearch.client.transport.TransportClient.<init>(TransportClient.java:165)
at org.elasticsearch.client.transport.TransportClient.<init>(TransportClient.java:121)
at com.dpalmisano.mapred.es.ElasticSearchBulkOutputFormat$ElasticSearchBulkRecordWriter.start_embedded_client(ElasticSearchBulkOutputFormat.java:151)
at com.dpalmisano.mapred.es.ElasticSearchBulkOutputFormat$ElasticSearchBulkRecordWriter.<init>(ElasticSearchBulkOutputFormat.java:67)
at com.dpalmisano.mapred.es.ElasticSearchBulkOutputFormat.getRecordWriter(ElasticSearchBulkOutputFormat.java:160)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:583)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:652)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:426)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

Digging it around, it seems it's mainly related to an old version of Lucene in the classpath. My Hadoop Job dependency tree is totally ok (it uses only lucene deps coming from elasticsearch 0.90.3), while Hadoop uses and older version (2.9.4).

Does anybody ever met this problem before? I'll really appreciate any hint or help from you guys.

thanks in advance

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Indexing ElasticSearch with Hadoop (LUCENE_36)

Davide Palmisano
Thanks Michael,

but the way suggested by Costin worked like a charm! 

Thank you very much for your help,

Davide


On Sun, Aug 18, 2013 at 3:09 AM, Michael Sick <[hidden email]> wrote:
For a quick-dirty, I've had M-R jobs create bulk index statement and just used Curl or the Jersey REST client to post them. 

Michael Sick | Big Data Architect
Serene Software Inc.
<a href="tel:919-523-4447" value="+19195234447" target="_blank">919-523-4447 (cell) | [hidden email] | www.serenesoftware.com
Core: ElasticSearch | HBase | Hadoop | RedShift/ParAccel | Hive



On Fri, Aug 16, 2013 at 2:33 PM, Costin Leau <[hidden email]> wrote:
The upcoming beta (ETA next week) doesn't have a concept of id - the id is generated by elasticsearch, however one can use id path to tell ES where to pick the id.
It looks like you're using plain M/R - in that case, one can pass a Map<Writable> that represents the JSON document. that's because in most cases, the data is read in M/R, Pig, Hive in native types - once the results are in, we handle the JSON conversion and HTTP communication (so the users doesn't have to handle it).

However, we do plan to allow json documents to be passed as is without having to be converted to Writable objects. As an intermediary work around, you could just load the JSON back as Map<Writable> (see the WritableUtils in ES-Hadoop) or, if you're generating JSON from Writable, pass those directly to es-hadoop.

Hope this helps,



On Fri, Aug 16, 2013 at 8:25 PM, Davide Palmisano <[hidden email]> wrote:
Thanks Costin,

I really appreciate your prompt response, unfortunately it's not feasibile to replace all the libs along the whole cluster (which is pretty huge)
 and due to the major version changes in Lucene, I doubt it will work out fine.

I was looking to elasticsearch-hadoop but I have a question:

1) how does the mapping between what the reducer writes and elasticsearch works? Are Reducer keys mapped to ES ids?
My reducers are writing a key which should be mapped to an ES document id and a json document which should be indexed as the proper document into ES.

do you think that elasticsearch-hadoop could help in this scenario?

all the best, 

On Friday, August 16, 2013 6:09:12 PM UTC+1, Costin Leau wrote:
The older lucene jar (the one from Hadoop) will be picked up since Hadoop starts up first and by the time your job gets executed, Lucene is already loaded.
You could try replacing the lucene jar in Hadoop (remember you have to do that across your entire cluster).

Speaking of which, have you looked at elasticsearch-hadoop?


On Fri, Aug 16, 2013 at 7:59 PM, Davide Palmisano <[hidden email]> wrote:
Dear all,

I need your help. Here's an high level description of what I'm trying to do, followed by some dives into details.

I'm running an Hadoop Job with basically indexes bulks of documents to an ElasticSearch two nodes cluster.
Some details on the configuration follows:

- hadoop version, running on Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
- elasticsearch version 0.90.3 running on Java(TM) SE Runtime Environment (build 1.6.0_45-b06)

Using, a TransportClient, each Reducer task tries to connect to the cluster as shown here[https://gist.github.com/dpalmisano/6251563], line 120.
Unfortunately, when the reducer tasks try to connect it permanently fails due this exception:



2013/08/16 16:48:52,604 [main] FATAL org.apache.hadoop.mapred.Child - Error running child : java.lang.NoSuchFieldError: LUCENE_36
at org.elasticsearch.Version.<clinit>(Version.java:42)
at org.elasticsearch.client.transport.TransportClient.<init>(TransportClient.java:165)
at org.elasticsearch.client.transport.TransportClient.<init>(TransportClient.java:121)
at com.dpalmisano.mapred.es.ElasticSearchBulkOutputFormat$ElasticSearchBulkRecordWriter.start_embedded_client(ElasticSearchBulkOutputFormat.java:151)
at com.dpalmisano.mapred.es.ElasticSearchBulkOutputFormat$ElasticSearchBulkRecordWriter.<init>(ElasticSearchBulkOutputFormat.java:67)
at com.dpalmisano.mapred.es.ElasticSearchBulkOutputFormat.getRecordWriter(ElasticSearchBulkOutputFormat.java:160)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:583)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:652)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:426)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

Digging it around, it seems it's mainly related to an old version of Lucene in the classpath. My Hadoop Job dependency tree is totally ok (it uses only lucene deps coming from elasticsearch 0.90.3), while Hadoop uses and older version (2.9.4).

Does anybody ever met this problem before? I'll really appreciate any hint or help from you guys.

thanks in advance

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/aLR2na9ZuIc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.



--
Davide Palmisano

http://davidepalmisano.com
http://twitter.com/dpalmisano

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.