elasticsearch TransportClient

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

elasticsearch TransportClient

Joerg Erdmenger
Hi everybody,

I'm experimeting with elasticsearch and I like it a lot so far. 
I have a question concerning the packaging though: I'd like to use elasticsearch from a webapp and the Java API would serve me well there I think - but it feels a bit wasteful to pull in all of elasticsearche's classes plus all of the dependencies just to use the org.elasticsearch.client.transport.TransportClient. Would it be possible to have a client artifact with just the minimum dependencies? Or would you advise  on just creating my own client using the REST API? If I understand correctly it means I lose the autodiscovery of the nodes in the cluster.

On a related note: Is it safe to reuse instances of the TransportClient?

Thanks for the great project

Jörg

P.S.: It would also be useful to have gradle generate and install a sources artifact when running 'gradle elasticsearch:install' - but it seems there is a bug in gradle 0.8 preventing that? (I'm not much of a gradle expert yet)
Reply | Threaded
Open this post in threaded view
|

Re: elasticsearch TransportClient

kimchy
Administrator
Hi,

   I am happy that you like elasticsearch so far :). Regarding the client, there is a different between the TransportClient and the Service#client(). I think that you would like to use the Server#client() if you want want auto discovery and "one hope" (for example, when indexing, directly go to the node to index, and not to an arbitrary node which will redirect it to the correct node). When using Server#client(), make sure you set the node.data setting to "false" if you don't want that server to participate in the allocation of shards (data).

   Regarding the source files, yea, its kindda of a pain to do it with gradle currently. Though I am currently leaning toward simply including the source in the jar file. Its simple, clean and no extra place to look for sources. The problem is that it means bigger jar file.

   Dependencies: If you are using the Server#client(), then most of the dependencies are required. This is because that server can potentially hold data (so the lucene jars are required, jgroups for discovery, and so on). In theory, the Transport client should only need the netty/joda/jackson jar files, but I have not tested it... . Is there a reason that you are concerned about the jar files? The benefits of running in the Server#client() mode far out-weight that extra jar files, imo. 

   Any Client (TransportClient or Server#client) are built to be reused from several threads. In fact, they would start to get pretty upset if not used from several threads as they are probably not fully utilizing elasticsearch (elasticsearch is highly concurrent). Note also the full async API that you get with them.

-shay.banon

On Mon, Mar 22, 2010 at 4:37 PM, Joerg Erdmenger <[hidden email]> wrote:
Hi everybody,

I'm experimeting with elasticsearch and I like it a lot so far. 
I have a question concerning the packaging though: I'd like to use elasticsearch from a webapp and the Java API would serve me well there I think - but it feels a bit wasteful to pull in all of elasticsearche's classes plus all of the dependencies just to use the org.elasticsearch.client.transport.TransportClient. Would it be possible to have a client artifact with just the minimum dependencies? Or would you advise  on just creating my own client using the REST API? If I understand correctly it means I lose the autodiscovery of the nodes in the cluster.

On a related note: Is it safe to reuse instances of the TransportClient?

Thanks for the great project

Jörg

P.S.: It would also be useful to have gradle generate and install a sources artifact when running 'gradle elasticsearch:install' - but it seems there is a bug in gradle 0.8 preventing that? (I'm not much of a gradle expert yet)

Reply | Threaded
Open this post in threaded view
|

Re: elasticsearch TransportClient

Joerg Erdmenger
Hi Shay, 

thanks for that!


2010/3/22 Shay Banon <[hidden email]>
Hi,

   I am happy that you like elasticsearch so far :). Regarding the client, there is a different between the TransportClient and the Service#client(). I think that you would like to use the Server#client() if you want want auto discovery and "one hope" (for example, when indexing, directly go to the node to index, and not to an arbitrary node which will redirect it to the correct node). When using Server#client(), make sure you set the node.data setting to "false" if you don't want that server to participate in the allocation of shards (data).

Ah, ok. So from a 'performance' point of view that is better then?
 
   Regarding the source files, yea, its kindda of a pain to do it with gradle currently. Though I am currently leaning toward simply including the source in the jar file. Its simple, clean and no extra place to look for sources. The problem is that it means bigger jar file.

I wouldn't mind that.
 
   Dependencies: If you are using the Server#client(), then most of the dependencies are required. This is because that server can potentially hold data (so the lucene jars are required, jgroups for discovery, and so on). In theory, the Transport client should only need the netty/joda/jackson jar files, but I have not tested it... . Is there a reason that you are concerned about the jar files? The benefits of running in the Server#client() mode far out-weight that extra jar files, imo. 

Hmm, I'm not that concerned if they are all needed - but the way I understood it I thought that I wasn't needing many of them and I'd like to avoid carrying around lots and lots of jars that are indeed never needed. 
Also, my Tomcat (when running it with Eclipse WTP) started doing funny things when I added all these dependencies (spring context kept restarting, got strange log setup errors, tomcat sent warnings about ThreadLocals not being cleaned up). It doesn't do it when running outside of WTP (I guess there must be some funny classloading business, but I need to investigate that further)


 
   Any Client (TransportClient or Server#client) are built to be reused from several threads. In fact, they would start to get pretty upset if not used from several threads as they are probably not fully utilizing elasticsearch (elasticsearch is highly concurrent). Note also the full async API that you get with them.

Ok.

Thanks

Jörg
 
-shay.banon


On Mon, Mar 22, 2010 at 4:37 PM, Joerg Erdmenger <[hidden email]> wrote:
Hi everybody,

I'm experimeting with elasticsearch and I like it a lot so far. 
I have a question concerning the packaging though: I'd like to use elasticsearch from a webapp and the Java API would serve me well there I think - but it feels a bit wasteful to pull in all of elasticsearche's classes plus all of the dependencies just to use the org.elasticsearch.client.transport.TransportClient. Would it be possible to have a client artifact with just the minimum dependencies? Or would you advise  on just creating my own client using the REST API? If I understand correctly it means I lose the autodiscovery of the nodes in the cluster.

On a related note: Is it safe to reuse instances of the TransportClient?

Thanks for the great project

Jörg

P.S.: It would also be useful to have gradle generate and install a sources artifact when running 'gradle elasticsearch:install' - but it seems there is a bug in gradle 0.8 preventing that? (I'm not much of a gradle expert yet)


Reply | Threaded
Open this post in threaded view
|

Re: elasticsearch TransportClient

kimchy
Administrator
On Mon, Mar 22, 2010 at 5:59 PM, Joerg Erdmenger <[hidden email]> wrote:
Hi Shay, 

thanks for that!


2010/3/22 Shay Banon <[hidden email]>

Hi,

   I am happy that you like elasticsearch so far :). Regarding the client, there is a different between the TransportClient and the Service#client(). I think that you would like to use the Server#client() if you want want auto discovery and "one hope" (for example, when indexing, directly go to the node to index, and not to an arbitrary node which will redirect it to the correct node). When using Server#client(), make sure you set the node.data setting to "false" if you don't want that server to participate in the allocation of shards (data).

Ah, ok. So from a 'performance' point of view that is better then?
 
   Regarding the source files, yea, its kindda of a pain to do it with gradle currently. Though I am currently leaning toward simply including the source in the jar file. Its simple, clean and no extra place to look for sources. The problem is that it means bigger jar file.

I wouldn't mind that.
 
   Dependencies: If you are using the Server#client(), then most of the dependencies are required. This is because that server can potentially hold data (so the lucene jars are required, jgroups for discovery, and so on). In theory, the Transport client should only need the netty/joda/jackson jar files, but I have not tested it... . Is there a reason that you are concerned about the jar files? The benefits of running in the Server#client() mode far out-weight that extra jar files, imo. 

Hmm, I'm not that concerned if they are all needed - but the way I understood it I thought that I wasn't needing many of them and I'd like to avoid carrying around lots and lots of jars that are indeed never needed. 
Also, my Tomcat (when running it with Eclipse WTP) started doing funny things when I added all these dependencies (spring context kept restarting, got strange log setup errors, tomcat sent warnings about ThreadLocals not being cleaned up). It doesn't do it when running outside of WTP (I guess there must be some funny classloading business, but I need to investigate that further)

The thread locals ones not being cleaned up might relate to elasticsearch, there are some static thread locals that I use in elasticsearch that are not released (though they are weak referenced...)
 


 
   Any Client (TransportClient or Server#client) are built to be reused from several threads. In fact, they would start to get pretty upset if not used from several threads as they are probably not fully utilizing elasticsearch (elasticsearch is highly concurrent). Note also the full async API that you get with them.

Ok.

Thanks

Jörg
 
-shay.banon


On Mon, Mar 22, 2010 at 4:37 PM, Joerg Erdmenger <[hidden email]> wrote:
Hi everybody,

I'm experimeting with elasticsearch and I like it a lot so far. 
I have a question concerning the packaging though: I'd like to use elasticsearch from a webapp and the Java API would serve me well there I think - but it feels a bit wasteful to pull in all of elasticsearche's classes plus all of the dependencies just to use the org.elasticsearch.client.transport.TransportClient. Would it be possible to have a client artifact with just the minimum dependencies? Or would you advise  on just creating my own client using the REST API? If I understand correctly it means I lose the autodiscovery of the nodes in the cluster.

On a related note: Is it safe to reuse instances of the TransportClient?

Thanks for the great project

Jörg

P.S.: It would also be useful to have gradle generate and install a sources artifact when running 'gradle elasticsearch:install' - but it seems there is a bug in gradle 0.8 preventing that? (I'm not much of a gradle expert yet)



Reply | Threaded
Open this post in threaded view
|

Re: elasticsearch TransportClient

kimchy
Administrator
By the way, aside from the static thread local, do you call client#close() and then (if started) server#close() when you undeploy the app? Can you post what tomcat generates during shutdown (assuming close are called)?

-shay.banon

On Mon, Mar 22, 2010 at 9:09 PM, Shay Banon <[hidden email]> wrote:
On Mon, Mar 22, 2010 at 5:59 PM, Joerg Erdmenger <[hidden email]> wrote:
Hi Shay, 

thanks for that!


2010/3/22 Shay Banon <[hidden email]>

Hi,

   I am happy that you like elasticsearch so far :). Regarding the client, there is a different between the TransportClient and the Service#client(). I think that you would like to use the Server#client() if you want want auto discovery and "one hope" (for example, when indexing, directly go to the node to index, and not to an arbitrary node which will redirect it to the correct node). When using Server#client(), make sure you set the node.data setting to "false" if you don't want that server to participate in the allocation of shards (data).

Ah, ok. So from a 'performance' point of view that is better then?
 
   Regarding the source files, yea, its kindda of a pain to do it with gradle currently. Though I am currently leaning toward simply including the source in the jar file. Its simple, clean and no extra place to look for sources. The problem is that it means bigger jar file.

I wouldn't mind that.
 
   Dependencies: If you are using the Server#client(), then most of the dependencies are required. This is because that server can potentially hold data (so the lucene jars are required, jgroups for discovery, and so on). In theory, the Transport client should only need the netty/joda/jackson jar files, but I have not tested it... . Is there a reason that you are concerned about the jar files? The benefits of running in the Server#client() mode far out-weight that extra jar files, imo. 

Hmm, I'm not that concerned if they are all needed - but the way I understood it I thought that I wasn't needing many of them and I'd like to avoid carrying around lots and lots of jars that are indeed never needed. 
Also, my Tomcat (when running it with Eclipse WTP) started doing funny things when I added all these dependencies (spring context kept restarting, got strange log setup errors, tomcat sent warnings about ThreadLocals not being cleaned up). It doesn't do it when running outside of WTP (I guess there must be some funny classloading business, but I need to investigate that further)

The thread locals ones not being cleaned up might relate to elasticsearch, there are some static thread locals that I use in elasticsearch that are not released (though they are weak referenced...)
 


 
   Any Client (TransportClient or Server#client) are built to be reused from several threads. In fact, they would start to get pretty upset if not used from several threads as they are probably not fully utilizing elasticsearch (elasticsearch is highly concurrent). Note also the full async API that you get with them.

Ok.

Thanks

Jörg
 
-shay.banon


On Mon, Mar 22, 2010 at 4:37 PM, Joerg Erdmenger <[hidden email]> wrote:
Hi everybody,

I'm experimeting with elasticsearch and I like it a lot so far. 
I have a question concerning the packaging though: I'd like to use elasticsearch from a webapp and the Java API would serve me well there I think - but it feels a bit wasteful to pull in all of elasticsearche's classes plus all of the dependencies just to use the org.elasticsearch.client.transport.TransportClient. Would it be possible to have a client artifact with just the minimum dependencies? Or would you advise  on just creating my own client using the REST API? If I understand correctly it means I lose the autodiscovery of the nodes in the cluster.

On a related note: Is it safe to reuse instances of the TransportClient?

Thanks for the great project

Jörg

P.S.: It would also be useful to have gradle generate and install a sources artifact when running 'gradle elasticsearch:install' - but it seems there is a bug in gradle 0.8 preventing that? (I'm not much of a gradle expert yet)




Reply | Threaded
Open this post in threaded view
|

Re: elasticsearch TransportClient

kimchy
Administrator
Fixed the thread local leak (even static ones :) ). They are cleaned when you do Server#close or TransportClient#close.

-shay.banon

On Mon, Mar 22, 2010 at 10:35 PM, Shay Banon <[hidden email]> wrote:
By the way, aside from the static thread local, do you call client#close() and then (if started) server#close() when you undeploy the app? Can you post what tomcat generates during shutdown (assuming close are called)?

-shay.banon


On Mon, Mar 22, 2010 at 9:09 PM, Shay Banon <[hidden email]> wrote:
On Mon, Mar 22, 2010 at 5:59 PM, Joerg Erdmenger <[hidden email]> wrote:
Hi Shay, 

thanks for that!


2010/3/22 Shay Banon <[hidden email]>

Hi,

   I am happy that you like elasticsearch so far :). Regarding the client, there is a different between the TransportClient and the Service#client(). I think that you would like to use the Server#client() if you want want auto discovery and "one hope" (for example, when indexing, directly go to the node to index, and not to an arbitrary node which will redirect it to the correct node). When using Server#client(), make sure you set the node.data setting to "false" if you don't want that server to participate in the allocation of shards (data).

Ah, ok. So from a 'performance' point of view that is better then?
 
   Regarding the source files, yea, its kindda of a pain to do it with gradle currently. Though I am currently leaning toward simply including the source in the jar file. Its simple, clean and no extra place to look for sources. The problem is that it means bigger jar file.

I wouldn't mind that.
 
   Dependencies: If you are using the Server#client(), then most of the dependencies are required. This is because that server can potentially hold data (so the lucene jars are required, jgroups for discovery, and so on). In theory, the Transport client should only need the netty/joda/jackson jar files, but I have not tested it... . Is there a reason that you are concerned about the jar files? The benefits of running in the Server#client() mode far out-weight that extra jar files, imo. 

Hmm, I'm not that concerned if they are all needed - but the way I understood it I thought that I wasn't needing many of them and I'd like to avoid carrying around lots and lots of jars that are indeed never needed. 
Also, my Tomcat (when running it with Eclipse WTP) started doing funny things when I added all these dependencies (spring context kept restarting, got strange log setup errors, tomcat sent warnings about ThreadLocals not being cleaned up). It doesn't do it when running outside of WTP (I guess there must be some funny classloading business, but I need to investigate that further)

The thread locals ones not being cleaned up might relate to elasticsearch, there are some static thread locals that I use in elasticsearch that are not released (though they are weak referenced...)
 


 
   Any Client (TransportClient or Server#client) are built to be reused from several threads. In fact, they would start to get pretty upset if not used from several threads as they are probably not fully utilizing elasticsearch (elasticsearch is highly concurrent). Note also the full async API that you get with them.

Ok.

Thanks

Jörg
 
-shay.banon


On Mon, Mar 22, 2010 at 4:37 PM, Joerg Erdmenger <[hidden email]> wrote:
Hi everybody,

I'm experimeting with elasticsearch and I like it a lot so far. 
I have a question concerning the packaging though: I'd like to use elasticsearch from a webapp and the Java API would serve me well there I think - but it feels a bit wasteful to pull in all of elasticsearche's classes plus all of the dependencies just to use the org.elasticsearch.client.transport.TransportClient. Would it be possible to have a client artifact with just the minimum dependencies? Or would you advise  on just creating my own client using the REST API? If I understand correctly it means I lose the autodiscovery of the nodes in the cluster.

On a related note: Is it safe to reuse instances of the TransportClient?

Thanks for the great project

Jörg

P.S.: It would also be useful to have gradle generate and install a sources artifact when running 'gradle elasticsearch:install' - but it seems there is a bug in gradle 0.8 preventing that? (I'm not much of a gradle expert yet)





Reply | Threaded
Open this post in threaded view
|

Re: elasticsearch TransportClient

Joerg Erdmenger
Thanks for the ongoing work.
But I think my problems had nothing to do with bugs in elasticsearch actually. I still don't quite understand were the problem was but I fixed it by excluding some elasticsearch logging dependencies that I have in my project anyway - especially the log4j dependency was pulling in a jmxri dependency from a dysfunctional java.net repository which seemed to cause issues. As I say, I still don't quite understand what was going on but it works now.

Jörg

P.S.: I was calling client#close on app shutdown