[Hadoop] storing data in ES using pig script

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[Hadoop] storing data in ES using pig script

hanine
Hello ,

I m trying to store data in ES (head) using pig script and it gives me

Input(s):
Failed to read data from "/user/hive/warehouse/books"

Output(s):
Failed to produce result in "books/book"

I ll be so thankful if someone would like help me

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/979f5688-bd53-4b76-a97a-5b0359c8be75%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [Hadoop] storing data in ES using pig script

Costin Leau
Hi,

That isn't a lot of information so it's hard to figure out what's actually work - one can only guess. Can you post your
stacktrace/logs and your script pig somewhere - like a gist?

One thing that stands out is that you mention you are using Pig yet your path points to a Hive warehouse:
> /Failed to read data from "/user/hive/warehouse/books"/

I can infer from this that maybe, the issue, is the fact that you are trying to read a Hive internal file, which Pig
can't understand, leading to the error that you see.

Cheers,


On 4/14/14 1:23 AM, hanine haninne wrote:

> Hello ,
>
> I m trying to store data in ES (head) using pig script and it gives me
>
> /Input(s):/
> /Failed to read data from "/user/hive/warehouse/books"/
>
> /Output(s):/
> /Failed to produce result in "books/book"/
>
> I ll be so thankful if someone would like help me
>
> --
> You received this message because you are subscribed to the Google Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> [hidden email] <mailto:[hidden email]>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/979f5688-bd53-4b76-a97a-5b0359c8be75%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/979f5688-bd53-4b76-a97a-5b0359c8be75%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/534B43CB.3020507%40gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [Hadoop] storing data in ES using pig script

hanine
Hi,

Here is my log and my script Pig

log file :
Backend error message
---------------------
java.io.IOException: java.io.IOException: Out of nodes and retries; caught exception
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:469)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:432)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:412)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:256)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.IOException: Out of nodes and retries; caught exception
    at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:81)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:221)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:205)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:209)
    at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:103)
    at org.elasticsearch.hadoop.rest.RestClient.discoverNodes(RestClient.java:85)
    at org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:60)
    at org.elasticsearch.hadoop.mr.EsOutputFormat$ESRecordWriter.init(EsOutputFormat.java:165)
    at org.elasticsearch.hadoop.mr.EsOutputFormat$ESRecordWriter.write(EsOutputFormat.java:147)
    at org.elasticsearch.hadoop.pig.EsStorage.putNext(EsStorage.java:188)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
    at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:586)
    at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:467)
    ... 11 more
Caused by: java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:579)
    at java.net.Socket.connect(Socket.java:528)
    at java.net.Socket.<init>(Socket.java:425)
    at java.net.Socket.<init>(Socket.java:280)
    at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:79)
    at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:121)
    at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:706)
    at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:386)
    at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:170)
    at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:396)
    at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:324)
    at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:160)
    at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:74)
    ... 25 more

Pig script:

REGISTER /home/hduser/hadoop/lib/elasticsearch-hadoop-1.2.0.jar;
weblogs = LOAD '/user/hive/warehouse/weblogs' USING PigStorage('\t')
AS (client_ip : chararray,
full_request_date : chararray,
day : int,
month : chararray,
month_num : int,
year : int,
hour : int,
minute : int,
second : int,
timezone : chararray,
http_verb : chararray,
uri : chararray,
http_status_code : chararray,
bytes_returned : chararray,
referrer : chararray,
user_agent : chararray
);
weblog_group = GROUP weblogs by (client_ip, year, month_num);
weblog_count = FOREACH weblog_group GENERATE group.client_ip, group.year, group.month_num,  COUNT_STAR(weblogs) as pageviews;

STORE weblog_count INTO 'weblogs2/logs2' USING org.elasticsearch.hadoop.pig.EsStorage();

And what ever I put in the LOAD it gives me the same result,even if I put the path of me desktop

Thx


Le lundi 14 avril 2014 03:11:23 UTC+1, Costin Leau a écrit :
Hi,

That isn't a lot of information so it's hard to figure out what's actually work - one can only guess. Can you post your
stacktrace/logs and your script pig somewhere - like a gist?

One thing that stands out is that you mention you are using Pig yet your path points to a Hive warehouse:
> /Failed to read data from "/user/hive/warehouse/books"/

I can infer from this that maybe, the issue, is the fact that you are trying to read a Hive internal file, which Pig
can't understand, leading to the error that you see.

Cheers,


On 4/14/14 1:23 AM, hanine haninne wrote:

> Hello ,
>
> I m trying to store data in ES (head) using pig script and it gives me
>
> /Input(s):/
> /Failed to read data from "/user/hive/warehouse/books"/
>
> /Output(s):/
> /Failed to produce result in "books/book"/
>
> I ll be so thankful if someone would like help me
>
> --
> You received this message because you are subscribed to the Google Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> <a href="javascript:" target="_blank" gdf-obfuscated-mailto="xkhG0FuJ_5oJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com <mailto:<a href="javascript:" target="_blank" gdf-obfuscated-mailto="xkhG0FuJ_5oJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearch+unsubscribe@...>.
> To view this discussion on the web visit
> <a href="https://groups.google.com/d/msgid/elasticsearch/979f5688-bd53-4b76-a97a-5b0359c8be75%40googlegroups.com" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/979f5688-bd53-4b76-a97a-5b0359c8be75%40googlegroups.com';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/979f5688-bd53-4b76-a97a-5b0359c8be75%40googlegroups.com';return true;">https://groups.google.com/d/msgid/elasticsearch/979f5688-bd53-4b76-a97a-5b0359c8be75%40googlegroups.com
> <<a href="https://groups.google.com/d/msgid/elasticsearch/979f5688-bd53-4b76-a97a-5b0359c8be75%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/979f5688-bd53-4b76-a97a-5b0359c8be75%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/979f5688-bd53-4b76-a97a-5b0359c8be75%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/979f5688-bd53-4b76-a97a-5b0359c8be75%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e766ac9d-788e-4614-80ea-04960fdc257f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [Hadoop] storing data in ES using pig script

Costin Leau
Since you are not specifying the network configuration for an elasticsearch node, it will default to localhost:9200. This works as long as you are running Hadoop (Pig, Hive, Cascading, etc...) on the same machine as Elasticsearch - based on your exception that is unlikely the case.
Try specifying the `es.nodes` parameter - see the documentation for more information.

Additionally, you seem to be using the wrong jar of es-hadoop - in your script you are registering es-hadoop-1.2.0.jar (which does not support the pig/hive/cascading functionality) while the stacktrace indicate you are using es-hadoop-1.3.X.jar.

Make sure you are using es-hadoop-1.3.0.M3.jar (which is released and available in Maven Central) and no other version. I recommend starting with the examples in the reference docs, which show to easily load and store data to/from Elasticsearch.
Once that works, consider extending your script.

Hope this helps,


On Mon, Apr 14, 2014 at 11:23 AM, hanine haninne <[hidden email]> wrote:
Hi,

Here is my log and my script Pig

log file :
Backend error message
---------------------
java.io.IOException: java.io.IOException: Out of nodes and retries; caught exception
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:469)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:432)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:412)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:256)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.IOException: Out of nodes and retries; caught exception
    at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:81)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:221)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:205)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:209)
    at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:103)
    at org.elasticsearch.hadoop.rest.RestClient.discoverNodes(RestClient.java:85)
    at org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:60)
    at org.elasticsearch.hadoop.mr.EsOutputFormat$ESRecordWriter.init(EsOutputFormat.java:165)
    at org.elasticsearch.hadoop.mr.EsOutputFormat$ESRecordWriter.write(EsOutputFormat.java:147)
    at org.elasticsearch.hadoop.pig.EsStorage.putNext(EsStorage.java:188)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
    at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:586)
    at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:467)
    ... 11 more
Caused by: java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:579)
    at java.net.Socket.connect(Socket.java:528)
    at java.net.Socket.<init>(Socket.java:425)
    at java.net.Socket.<init>(Socket.java:280)
    at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:79)
    at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:121)
    at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:706)
    at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:386)
    at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:170)
    at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:396)
    at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:324)
    at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:160)
    at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:74)
    ... 25 more

Pig script:

REGISTER /home/hduser/hadoop/lib/elasticsearch-hadoop-1.2.0.jar;
weblogs = LOAD '/user/hive/warehouse/weblogs' USING PigStorage('\t')
AS (client_ip : chararray,
full_request_date : chararray,
day : int,
month : chararray,
month_num : int,
year : int,
hour : int,
minute : int,
second : int,
timezone : chararray,
http_verb : chararray,
uri : chararray,
http_status_code : chararray,
bytes_returned : chararray,
referrer : chararray,
user_agent : chararray
);
weblog_group = GROUP weblogs by (client_ip, year, month_num);
weblog_count = FOREACH weblog_group GENERATE group.client_ip, group.year, group.month_num,  COUNT_STAR(weblogs) as pageviews;

STORE weblog_count INTO 'weblogs2/logs2' USING org.elasticsearch.hadoop.pig.EsStorage();

And what ever I put in the LOAD it gives me the same result,even if I put the path of me desktop

Thx


Le lundi 14 avril 2014 03:11:23 UTC+1, Costin Leau a écrit :
Hi,

That isn't a lot of information so it's hard to figure out what's actually work - one can only guess. Can you post your
stacktrace/logs and your script pig somewhere - like a gist?

One thing that stands out is that you mention you are using Pig yet your path points to a Hive warehouse:
> /Failed to read data from "/user/hive/warehouse/books"/

I can infer from this that maybe, the issue, is the fact that you are trying to read a Hive internal file, which Pig
can't understand, leading to the error that you see.

Cheers,


On 4/14/14 1:23 AM, hanine haninne wrote:

> Hello ,
>
> I m trying to store data in ES (head) using pig script and it gives me
>
> /Input(s):/
> /Failed to read data from "/user/hive/warehouse/books"/
>
> /Output(s):/
> /Failed to produce result in "books/book"/
>
> I ll be so thankful if someone would like help me
>
> --
> You received this message because you are subscribed to the Google Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> elasticsearc...@googlegroups.com <mailto:elasticsearch+[hidden email]>.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e766ac9d-788e-4614-80ea-04960fdc257f%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJogdmdSoOPW9OZ9nf0D4-4m0sV9BCbx7YOH0Z3TVBtGCCi4mw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [Hadoop] storing data in ES using pig script

hanine
Ok ,thank you so much


2014-04-14 9:33 GMT+01:00 Costin Leau <[hidden email]>:
Since you are not specifying the network configuration for an elasticsearch node, it will default to localhost:9200. This works as long as you are running Hadoop (Pig, Hive, Cascading, etc...) on the same machine as Elasticsearch - based on your exception that is unlikely the case.
Try specifying the `es.nodes` parameter - see the documentation for more information.

Additionally, you seem to be using the wrong jar of es-hadoop - in your script you are registering es-hadoop-1.2.0.jar (which does not support the pig/hive/cascading functionality) while the stacktrace indicate you are using es-hadoop-1.3.X.jar.

Make sure you are using es-hadoop-1.3.0.M3.jar (which is released and available in Maven Central) and no other version. I recommend starting with the examples in the reference docs, which show to easily load and store data to/from Elasticsearch.
Once that works, consider extending your script.

Hope this helps,


On Mon, Apr 14, 2014 at 11:23 AM, hanine haninne <[hidden email]> wrote:
Hi,

Here is my log and my script Pig

log file :
Backend error message
---------------------
java.io.IOException: java.io.IOException: Out of nodes and retries; caught exception
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:469)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:432)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:412)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:256)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.IOException: Out of nodes and retries; caught exception
    at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:81)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:221)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:205)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:209)
    at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:103)
    at org.elasticsearch.hadoop.rest.RestClient.discoverNodes(RestClient.java:85)
    at org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:60)
    at org.elasticsearch.hadoop.mr.EsOutputFormat$ESRecordWriter.init(EsOutputFormat.java:165)
    at org.elasticsearch.hadoop.mr.EsOutputFormat$ESRecordWriter.write(EsOutputFormat.java:147)
    at org.elasticsearch.hadoop.pig.EsStorage.putNext(EsStorage.java:188)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
    at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:586)
    at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:467)
    ... 11 more
Caused by: java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:579)
    at java.net.Socket.connect(Socket.java:528)
    at java.net.Socket.<init>(Socket.java:425)
    at java.net.Socket.<init>(Socket.java:280)
    at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:79)
    at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:121)
    at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:706)
    at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:386)
    at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:170)
    at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:396)
    at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:324)
    at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:160)
    at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:74)
    ... 25 more

Pig script:

REGISTER /home/hduser/hadoop/lib/elasticsearch-hadoop-1.2.0.jar;
weblogs = LOAD '/user/hive/warehouse/weblogs' USING PigStorage('\t')
AS (client_ip : chararray,
full_request_date : chararray,
day : int,
month : chararray,
month_num : int,
year : int,
hour : int,
minute : int,
second : int,
timezone : chararray,
http_verb : chararray,
uri : chararray,
http_status_code : chararray,
bytes_returned : chararray,
referrer : chararray,
user_agent : chararray
);
weblog_group = GROUP weblogs by (client_ip, year, month_num);
weblog_count = FOREACH weblog_group GENERATE group.client_ip, group.year, group.month_num,  COUNT_STAR(weblogs) as pageviews;

STORE weblog_count INTO 'weblogs2/logs2' USING org.elasticsearch.hadoop.pig.EsStorage();

And what ever I put in the LOAD it gives me the same result,even if I put the path of me desktop

Thx


Le lundi 14 avril 2014 03:11:23 UTC+1, Costin Leau a écrit :
Hi,

That isn't a lot of information so it's hard to figure out what's actually work - one can only guess. Can you post your
stacktrace/logs and your script pig somewhere - like a gist?

One thing that stands out is that you mention you are using Pig yet your path points to a Hive warehouse:
> /Failed to read data from "/user/hive/warehouse/books"/

I can infer from this that maybe, the issue, is the fact that you are trying to read a Hive internal file, which Pig
can't understand, leading to the error that you see.

Cheers,


On 4/14/14 1:23 AM, hanine haninne wrote:

> Hello ,
>
> I m trying to store data in ES (head) using pig script and it gives me
>
> /Input(s):/
> /Failed to read data from "/user/hive/warehouse/books"/
>
> /Output(s):/
> /Failed to produce result in "books/book"/
>
> I ll be so thankful if someone would like help me
>
> --
> You received this message because you are subscribed to the Google Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> elasticsearc...@googlegroups.com <mailto:elasticsearch+[hidden email]>.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e766ac9d-788e-4614-80ea-04960fdc257f%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJogdmdSoOPW9OZ9nf0D4-4m0sV9BCbx7YOH0Z3TVBtGCCi4mw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CANXJSR_7p3aUEdJjbU2A3%3DYb%2BW08U4kj%2BEONGh3W7mi30iDqRg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [Hadoop] storing data in ES using pig script

hanine
Hello ,
I used "elasticsearch-hadoop-1.3.0.M2"
and it given me

Failed Jobs:
JobId    Alias    Feature    Message    Outputs
job_201404142111_0008    weblog_count,weblog_group,weblogs    GROUP_BY,COMBINER    Message: Job failed! Error - # of failed Reduce Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201404142111_0008_r_000000    weblogs/logs2,

Input(s):
Failed to read data from "/user/hive/warehouse/weblogs"

Output(s):
Failed to produce result in "weblogs/logs2"

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
 

I think it s better to know how things works from the beginning,so pls would u like to tell what I have to do (what should I start with) what should I do to configure elasticsearch (head) with Hadoop and how can I work with elasticsearch head

Thank you so much ,really all what u say is so helpful .Thank you


2014-04-14 14:09 GMT+01:00 hanine haninne <[hidden email]>:
Ok ,thank you so much


2014-04-14 9:33 GMT+01:00 Costin Leau <[hidden email]>:

Since you are not specifying the network configuration for an elasticsearch node, it will default to localhost:9200. This works as long as you are running Hadoop (Pig, Hive, Cascading, etc...) on the same machine as Elasticsearch - based on your exception that is unlikely the case.
Try specifying the `es.nodes` parameter - see the documentation for more information.

Additionally, you seem to be using the wrong jar of es-hadoop - in your script you are registering es-hadoop-1.2.0.jar (which does not support the pig/hive/cascading functionality) while the stacktrace indicate you are using es-hadoop-1.3.X.jar.

Make sure you are using es-hadoop-1.3.0.M3.jar (which is released and available in Maven Central) and no other version. I recommend starting with the examples in the reference docs, which show to easily load and store data to/from Elasticsearch.
Once that works, consider extending your script.

Hope this helps,


On Mon, Apr 14, 2014 at 11:23 AM, hanine haninne <[hidden email]> wrote:
Hi,

Here is my log and my script Pig

log file :
Backend error message
---------------------
java.io.IOException: java.io.IOException: Out of nodes and retries; caught exception
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:469)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:432)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:412)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:256)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.IOException: Out of nodes and retries; caught exception
    at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:81)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:221)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:205)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:209)
    at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:103)
    at org.elasticsearch.hadoop.rest.RestClient.discoverNodes(RestClient.java:85)
    at org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:60)
    at org.elasticsearch.hadoop.mr.EsOutputFormat$ESRecordWriter.init(EsOutputFormat.java:165)
    at org.elasticsearch.hadoop.mr.EsOutputFormat$ESRecordWriter.write(EsOutputFormat.java:147)
    at org.elasticsearch.hadoop.pig.EsStorage.putNext(EsStorage.java:188)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
    at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:586)
    at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:467)
    ... 11 more
Caused by: java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:579)
    at java.net.Socket.connect(Socket.java:528)
    at java.net.Socket.<init>(Socket.java:425)
    at java.net.Socket.<init>(Socket.java:280)
    at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:79)
    at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:121)
    at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:706)
    at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:386)
    at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:170)
    at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:396)
    at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:324)
    at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:160)
    at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:74)
    ... 25 more

Pig script:

REGISTER /home/hduser/hadoop/lib/elasticsearch-hadoop-1.2.0.jar;
weblogs = LOAD '/user/hive/warehouse/weblogs' USING PigStorage('\t')
AS (client_ip : chararray,
full_request_date : chararray,
day : int,
month : chararray,
month_num : int,
year : int,
hour : int,
minute : int,
second : int,
timezone : chararray,
http_verb : chararray,
uri : chararray,
http_status_code : chararray,
bytes_returned : chararray,
referrer : chararray,
user_agent : chararray
);
weblog_group = GROUP weblogs by (client_ip, year, month_num);
weblog_count = FOREACH weblog_group GENERATE group.client_ip, group.year, group.month_num,  COUNT_STAR(weblogs) as pageviews;

STORE weblog_count INTO 'weblogs2/logs2' USING org.elasticsearch.hadoop.pig.EsStorage();

And what ever I put in the LOAD it gives me the same result,even if I put the path of me desktop

Thx


Le lundi 14 avril 2014 03:11:23 UTC+1, Costin Leau a écrit :
Hi,

That isn't a lot of information so it's hard to figure out what's actually work - one can only guess. Can you post your
stacktrace/logs and your script pig somewhere - like a gist?

One thing that stands out is that you mention you are using Pig yet your path points to a Hive warehouse:
> /Failed to read data from "/user/hive/warehouse/books"/

I can infer from this that maybe, the issue, is the fact that you are trying to read a Hive internal file, which Pig
can't understand, leading to the error that you see.

Cheers,


On 4/14/14 1:23 AM, hanine haninne wrote:

> Hello ,
>
> I m trying to store data in ES (head) using pig script and it gives me
>
> /Input(s):/
> /Failed to read data from "/user/hive/warehouse/books"/
>
> /Output(s):/
> /Failed to produce result in "books/book"/
>
> I ll be so thankful if someone would like help me
>
> --
> You received this message because you are subscribed to the Google Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> elasticsearc...@googlegroups.com <mailto:elasticsearch+[hidden email]>.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e766ac9d-788e-4614-80ea-04960fdc257f%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJogdmdSoOPW9OZ9nf0D4-4m0sV9BCbx7YOH0Z3TVBtGCCi4mw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CANXJSR9yKurOp37cr2OLPfF6%2BgE9K7A3DaTi0ecnVWiFJHK96w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [Hadoop] storing data in ES using pig script

hanine
In reply to this post by Costin Leau
Hello ,
I used "elasticsearch-hadoop-1.3.0.M2"
and it given me 

Failed Jobs:
JobId    Alias    Feature    Message    Outputs
job_201404142111_0008    weblog_count,weblog_group,weblogs    GROUP_BY,COMBINER    Message: Job failed! Error - # of failed Reduce Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201404142111_0008_r_000000    weblogs/logs2,

Input(s):
Failed to read data from "/user/hive/warehouse/weblogs"

Output(s):
Failed to produce result in "weblogs/logs2"

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
 

I think it s better to know how things works from the beginning,so pls would u like to tell what I have to do (what should I start with) what should I do to configure elasticsearch (head) with Hadoop and how can I work with elasticsearch head 

Thank you so much ,really all what u say is so helpful .Thank you

Le lundi 14 avril 2014 09:33:12 UTC+1, Costin Leau a écrit :
Since you are not specifying the network configuration for an elasticsearch node, it will default to localhost:9200. This works as long as you are running Hadoop (Pig, Hive, Cascading, etc...) on the same machine as Elasticsearch - based on your exception that is unlikely the case.
Try specifying the `es.nodes` parameter - see the documentation for more information.

Additionally, you seem to be using the wrong jar of es-hadoop - in your script you are registering es-hadoop-1.2.0.jar (which does not support the pig/hive/cascading functionality) while the stacktrace indicate you are using es-hadoop-1.3.X.jar.

Make sure you are using es-hadoop-1.3.0.M3.jar (which is released and available in Maven Central) and no other version. I recommend starting with the examples in the reference docs, which show to easily load and store data to/from Elasticsearch.
Once that works, consider extending your script.

Hope this helps,


On Mon, Apr 14, 2014 at 11:23 AM, hanine haninne <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="pigzmbdqgiIJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">hani...@...> wrote:
Hi,

Here is my log and my script Pig

log file :
Backend error message
---------------------
java.io.IOException: java.io.IOException: Out of nodes and retries; caught exception
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:469)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:432)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:412)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:256)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.IOException: Out of nodes and retries; caught exception
    at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:81)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:221)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:205)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:209)
    at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:103)
    at org.elasticsearch.hadoop.rest.RestClient.discoverNodes(RestClient.java:85)
    at org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:60)
    at org.elasticsearch.hadoop.mr.EsOutputFormat$ESRecordWriter.init(EsOutputFormat.java:165)
    at org.elasticsearch.hadoop.mr.EsOutputFormat$ESRecordWriter.write(EsOutputFormat.java:147)
    at org.elasticsearch.hadoop.pig.EsStorage.putNext(EsStorage.java:188)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
    at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:586)
    at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:467)
    ... 11 more
Caused by: java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:579)
    at java.net.Socket.connect(Socket.java:528)
    at java.net.Socket.<init>(Socket.java:425)
    at java.net.Socket.<init>(Socket.java:280)
    at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:79)
    at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:121)
    at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:706)
    at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:386)
    at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:170)
    at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:396)
    at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:324)
    at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:160)
    at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:74)
    ... 25 more

Pig script:

REGISTER /home/hduser/hadoop/lib/elasticsearch-hadoop-1.2.0.jar;
weblogs = LOAD '/user/hive/warehouse/weblogs' USING PigStorage('\t')
AS (client_ip : chararray,
full_request_date : chararray,
day : int,
month : chararray,
month_num : int,
year : int,
hour : int,
minute : int,
second : int,
timezone : chararray,
http_verb : chararray,
uri : chararray,
http_status_code : chararray,
bytes_returned : chararray,
referrer : chararray,
user_agent : chararray
);
weblog_group = GROUP weblogs by (client_ip, year, month_num);
weblog_count = FOREACH weblog_group GENERATE group.client_ip, group.year, group.month_num,  COUNT_STAR(weblogs) as pageviews;

STORE weblog_count INTO 'weblogs2/logs2' USING org.elasticsearch.hadoop.pig.EsStorage();

And what ever I put in the LOAD it gives me the same result,even if I put the path of me desktop

Thx


Le lundi 14 avril 2014 03:11:23 UTC+1, Costin Leau a écrit :
Hi,

That isn't a lot of information so it's hard to figure out what's actually work - one can only guess. Can you post your
stacktrace/logs and your script pig somewhere - like a gist?

One thing that stands out is that you mention you are using Pig yet your path points to a Hive warehouse:
> /Failed to read data from "/user/hive/warehouse/books"/

I can infer from this that maybe, the issue, is the fact that you are trying to read a Hive internal file, which Pig
can't understand, leading to the error that you see.

Cheers,


On 4/14/14 1:23 AM, hanine haninne wrote:

> Hello ,
>
> I m trying to store data in ES (head) using pig script and it gives me
>
> /Input(s):/
> /Failed to read data from "/user/hive/warehouse/books"/
>
> /Output(s):/
> /Failed to produce result in "books/book"/
>
> I ll be so thankful if someone would like help me
>
> --
> You received this message because you are subscribed to the Google Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> elasticsearc...@googlegroups.com <mailto:elasticsearch+unsubscr[hidden email]>.
> To view this discussion on the web visit
> <a href="https://groups.google.com/d/msgid/elasticsearch/979f5688-bd53-4b76-a97a-5b0359c8be75%40googlegroups.com" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/979f5688-bd53-4b76-a97a-5b0359c8be75%40googlegroups.com';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/979f5688-bd53-4b76-a97a-5b0359c8be75%40googlegroups.com';return true;">https://groups.google.com/d/msgid/elasticsearch/979f5688-bd53-4b76-a97a-5b0359c8be75%40googlegroups.com
> <<a href="https://groups.google.com/d/msgid/elasticsearch/979f5688-bd53-4b76-a97a-5b0359c8be75%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/979f5688-bd53-4b76-a97a-5b0359c8be75%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/979f5688-bd53-4b76-a97a-5b0359c8be75%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/979f5688-bd53-4b76-a97a-5b0359c8be75%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="pigzmbdqgiIJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/e766ac9d-788e-4614-80ea-04960fdc257f%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/e766ac9d-788e-4614-80ea-04960fdc257f%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/e766ac9d-788e-4614-80ea-04960fdc257f%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/e766ac9d-788e-4614-80ea-04960fdc257f%40googlegroups.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/72350b3d-dc2f-4f5a-b066-cfd0b2241545%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [Hadoop] storing data in ES using pig script

Costin Leau
Glad to hear it but know that the latest release is 1.3.0 M3. Simply check the official project page [1] and you get all
the info [2], including the download setup from Maven, for both stable and dev/snapshot releases [3]

[1] http://www.elasticsearch.org/overview/hadoop/
[2] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/index.html
[3] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/install.html

On 4/15/14 12:10 PM, hanine haninne wrote:

> Hello ,
> I used "elasticsearch-hadoop-1.3.0.M2"
> and it given me
>
> Failed Jobs:
> JobId    Alias    Feature    Message    Outputs
> job_201404142111_0008    weblog_count,weblog_group,weblogs    GROUP_BY,COMBINER    Message: Job failed! Error - # of
> failed Reduce Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201404142111_0008_r_000000
>   weblogs/logs2,
>
> Input(s):
> Failed to read data from "/user/hive/warehouse/weblogs"
>
> Output(s):
> Failed to produce result in "weblogs/logs2"
>
> Counters:
> Total records written : 0
> Total bytes written : 0
> Spillable Memory Manager spill count : 0
> Total bags proactively spilled: 0
> Total records proactively spilled: 0
>
>
> I think it s better to know how things works from the beginning,so pls would u like to tell what I have to do (what
> should I start with) what should I do to configure elasticsearch (head) with Hadoop and how can I work with
> elasticsearch head
>
> Thank you so much ,really all what u say is so helpful .Thank you
>
> Le lundi 14 avril 2014 09:33:12 UTC+1, Costin Leau a écrit :
>
>     Since you are not specifying the network configuration for an elasticsearch node, it will default to localhost:9200.
>     This works as long as you are running Hadoop (Pig, Hive, Cascading, etc...) on the same machine as Elasticsearch -
>     based on your exception that is unlikely the case.
>     Try specifying the `es.nodes` parameter - see the documentation for more information.
>
>     Additionally, you seem to be using the wrong jar of es-hadoop - in your script you are registering
>     es-hadoop-1.2.0.jar (which does not support the pig/hive/cascading functionality) while the stacktrace indicate you
>     are using es-hadoop-1.3.X.jar.
>
>     Make sure you are using es-hadoop-1.3.0.M3.jar (which is released and available in Maven Central) and no other
>     version. I recommend starting with the examples in the reference docs, which show to easily load and store data
>     to/from Elasticsearch.
>     Once that works, consider extending your script.
>
>     Hope this helps,
>
>
>     On Mon, Apr 14, 2014 at 11:23 AM, hanine haninne <[hidden email] <javascript:>> wrote:
>
>         Hi,
>
>         Here is my log and my script Pig
>
>         log file :
>         Backend error message
>         ---------------------
>         java.io.IOException: java.io.IOException: Out of nodes and retries; caught exception
>              at
>         org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:469)
>              at
>         org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:432)
>              at
>         org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:412)
>              at
>         org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:256)
>              at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
>              at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>              at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
>              at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>              at java.security.AccessController.doPrivileged(Native Method)
>              at javax.security.auth.Subject.doAs(Subject.java:415)
>              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
>              at org.apache.hadoop.mapred.Child.main(Child.java:249)
>         Caused by: java.io.IOException: Out of nodes and retries; caught exception
>              at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:81)
>              at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:221)
>              at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:205)
>              at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:209)
>              at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:103)
>              at org.elasticsearch.hadoop.rest.RestClient.discoverNodes(RestClient.java:85)
>              at org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:60)
>              at org.elasticsearch.hadoop.mr.EsOutputFormat$ESRecordWriter.init(EsOutputFormat.java:165)
>              at org.elasticsearch.hadoop.mr.EsOutputFormat$ESRecordWriter.write(EsOutputFormat.java:147)
>              at org.elasticsearch.hadoop.pig.EsStorage.putNext(EsStorage.java:188)
>              at
>         org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>              at
>         org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>              at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:586)
>              at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>              at
>         org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:467)
>              ... 11 more
>         Caused by: java.net.ConnectException: Connection refused
>              at java.net.PlainSocketImpl.socketConnect(Native Method)
>              at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>              at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>              at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>              at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>              at java.net.Socket.connect(Socket.java:579)
>              at java.net.Socket.connect(Socket.java:528)
>              at java.net.Socket.<init>(Socket.java:425)
>              at java.net.Socket.<init>(Socket.java:280)
>              at
>         org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:79)
>              at
>         org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:121)
>              at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:706)
>              at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:386)
>              at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:170)
>              at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:396)
>              at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:324)
>              at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:160)
>              at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:74)
>              ... 25 more
>
>         Pig script:
>         REGISTER /home/hduser/hadoop/lib/elasticsearch-hadoop-1.2.0.jar;
>         weblogs = LOAD '/user/hive/warehouse/weblogs' USING PigStorage('\t')
>         AS (client_ip : chararray,
>         full_request_date : chararray,
>         day : int,
>         month : chararray,
>         month_num : int,
>         year : int,
>         hour : int,
>         minute : int,
>         second : int,
>         timezone : chararray,
>         http_verb : chararray,
>         uri : chararray,
>         http_status_code : chararray,
>         bytes_returned : chararray,
>         referrer : chararray,
>         user_agent : chararray
>         );
>         weblog_group = GROUP weblogs by (client_ip, year, month_num);
>         weblog_count = FOREACH weblog_group GENERATE group.client_ip, group.year, group.month_num,  COUNT_STAR(weblogs)
>         as pageviews;
>
>         STORE weblog_count INTO 'weblogs2/logs2' USING org.elasticsearch.hadoop.pig.EsStorage();
>
>         And what ever I put in the LOAD it gives me the same result,even if I put the path of me desktop
>
>         Thx
>
>         Le lundi 14 avril 2014 03:11:23 UTC+1, Costin Leau a écrit :
>
>             Hi,
>
>             That isn't a lot of information so it's hard to figure out what's actually work - one can only guess. Can
>             you post your
>             stacktrace/logs and your script pig somewhere - like a gist?
>
>             One thing that stands out is that you mention you are using Pig yet your path points to a Hive warehouse:
>             > /Failed to read data from "/user/hive/warehouse/books"/
>
>             I can infer from this that maybe, the issue, is the fact that you are trying to read a Hive internal file,
>             which Pig
>             can't understand, leading to the error that you see.
>
>             Cheers,
>
>
>             On 4/14/14 1:23 AM, hanine haninne wrote:
>             > Hello ,
>             >
>             > I m trying to store data in ES (head) using pig script and it gives me
>             >
>             > /Input(s):/
>             > /Failed to read data from "/user/hive/warehouse/books"/
>             >
>             > /Output(s):/
>             > /Failed to produce result in "books/book"/
>             >
>             > I ll be so thankful if someone would like help me
>             >
>             > --
>             > You received this message because you are subscribed to the Google Groups "elasticsearch" group.
>             > To unsubscribe from this group and stop receiving emails from it, send an email to
>             >[hidden email] <mailto:[hidden email]>.
>             > To view this discussion on the web visit
>             ><a href="https://groups.google.com/d/__msgid/elasticsearch/979f5688-__bd53-4b76-a97a-5b0359c8be75%__40googlegroups.com">https://groups.google.com/d/__msgid/elasticsearch/979f5688-__bd53-4b76-a97a-5b0359c8be75%__40googlegroups.com <https://groups.google.com/d/msgid/elasticsearch/979f5688-bd53-4b76-a97a-5b0359c8be75%40googlegroups.com>
>
>             > <<a href="https://groups.google.com/d/__msgid/elasticsearch/979f5688-__bd53-4b76-a97a-5b0359c8be75%__40googlegroups.com?utm_medium=__email&utm_source=footer">https://groups.google.com/d/__msgid/elasticsearch/979f5688-__bd53-4b76-a97a-5b0359c8be75%__40googlegroups.com?utm_medium=__email&utm_source=footer
>             <https://groups.google.com/d/msgid/elasticsearch/979f5688-bd53-4b76-a97a-5b0359c8be75%40googlegroups.com?utm_medium=email&utm_source=footer>>.
>
>             > For more options, visithttps://groups.google.com/d/__optout <https://groups.google.com/d/optout>.
>
>             --
>             Costin
>
>         --
>         You received this message because you are subscribed to the Google Groups "elasticsearch" group.
>         To unsubscribe from this group and stop receiving emails from it, send an email to
>         [hidden email] <javascript:>.
>         To view this discussion on the web visit
>         https://groups.google.com/d/msgid/elasticsearch/e766ac9d-788e-4614-80ea-04960fdc257f%40googlegroups.com
>         <https://groups.google.com/d/msgid/elasticsearch/e766ac9d-788e-4614-80ea-04960fdc257f%40googlegroups.com?utm_medium=email&utm_source=footer>.
>
>         For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to the Google Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> [hidden email] <mailto:[hidden email]>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/72350b3d-dc2f-4f5a-b066-cfd0b2241545%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/72350b3d-dc2f-4f5a-b066-cfd0b2241545%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/534D0E8C.3050704%40gmail.com.
For more options, visit https://groups.google.com/d/optout.