Elasticsearch Java client much slower than rest call

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Elasticsearch Java client much slower than rest call

DMAC
Hi 

We are starting to use the Java API in ElasticSearch. The only problem is that the queries seem to take much longer to retrieve data than simply using curl. 

Our development server(19.08) has very small index (2000 documents, with 8 fields) 

When making a call to retrieve ~1200 documents it takes 17 seconds to run a query 
versus < 1 second to get the same result using curl

Here is the code I am using to test ES

LOG.info(String.format("Initializing connection to ElasticSearch %s/%s on %d", host, clusterName, port));
settings = ImmutableSettings.settingsBuilder()
.put("client.transport.sniff",true).build();
client = new TransportClient(settings).addTransportAddress(new 
InetSocketTransportAddress(host, port));
searchRequest = client.prepareSearch(clusterName);
mapper = new SerObjectMapper();

BoolQueryBuilder query = boolQuery();
for(String term : keywordList)
query.should(fieldQuery("body", term));

long start = System.currentTimeMillis();
/** From here  */
SearchResponse response = client.prepareSearch(clusterName)
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(query).setFrom(0).setSize(limit).setExplain(true)
.execute()
.actionGet();
/** To Here takes 17 seconds in Java. */
SearchHit[] docs = response.getHits().getHits();
System.err.println("Query took  " + (System.currentTimeMillis()-start));
for(SearchHit doc : docs)
urlList.add((String)doc.getSource().get("url"));
LOG.info(String.format("Returning %d results", urlList.size()));

I wonder if you could point out what I am doing wrong?

Thanks in advance 

D.



--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Elasticsearch Java client much slower than rest call

joergprante@gmail.com
Switch off explain, setExplain(false)

Unfortunately, this is in the docs http://www.elasticsearch.org/guide/reference/java-api/search.html but it's not the default, only an optional setting.

Best regards,

Jörg

On Tuesday, November 13, 2012 8:30:53 PM UTC+1, DMAC wrote:
Hi 

We are starting to use the Java API in ElasticSearch. The only problem is that the queries seem to take much longer to retrieve data than simply using curl. 

Our development server(19.08) has very small index (2000 documents, with 8 fields) 

When making a call to retrieve ~1200 documents it takes 17 seconds to run a query 
versus < 1 second to get the same result using curl

Here is the code I am using to test ES

LOG.info(String.format("Initializing connection to ElasticSearch %s/%s on %d", host, clusterName, port));
settings = ImmutableSettings.settingsBuilder()
.put("client.transport.sniff",true).build();
client = new TransportClient(settings).addTransportAddress(new 
InetSocketTransportAddress(host, port));
searchRequest = client.prepareSearch(clusterName);
mapper = new SerObjectMapper();

BoolQueryBuilder query = boolQuery();
for(String term : keywordList)
query.should(fieldQuery("body", term));

long start = System.currentTimeMillis();
/** From here  */
SearchResponse response = client.prepareSearch(clusterName)
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(query).setFrom(0).setSize(limit).setExplain(true)
.execute()
.actionGet();
/** To Here takes 17 seconds in Java. */
SearchHit[] docs = response.getHits().getHits();
System.err.println("Query took  " + (System.currentTimeMillis()-start));
for(SearchHit doc : docs)
urlList.add((String)doc.getSource().get("url"));
LOG.info(String.format("Returning %d results", urlList.size()));

I wonder if you could point out what I am doing wrong?

Thanks in advance 

D.



--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Elasticsearch Java client much slower than rest call

Derry O' Sullivan
In reply to this post by DMAC
2 other points on this.

1. I'm not sure what limit is (1200)? but returning that many values (versus returning the default of 10) makes a big difference
2. Are you doing exactly the same search in the REST call (e.g. DFS_QUERY_THEN_SEARCH search type, num results etc)?

We have done lots of testing with both http/rest with lots of search types/limits and i don't think i've every seen such a difference (or anything near that) in terms of timings. (using ES 0.19.9 over a multi-node cluster with millions of docs)

On Tuesday, 13 November 2012 19:30:53 UTC, DMAC wrote:
Hi 

We are starting to use the Java API in ElasticSearch. The only problem is that the queries seem to take much longer to retrieve data than simply using curl. 

Our development server(19.08) has very small index (2000 documents, with 8 fields) 

When making a call to retrieve ~1200 documents it takes 17 seconds to run a query 
versus < 1 second to get the same result using curl

Here is the code I am using to test ES

LOG.info(String.format("Initializing connection to ElasticSearch %s/%s on %d", host, clusterName, port));
settings = ImmutableSettings.settingsBuilder()
.put("client.transport.sniff",true).build();
client = new TransportClient(settings).addTransportAddress(new 
InetSocketTransportAddress(host, port));
searchRequest = client.prepareSearch(clusterName);
mapper = new SerObjectMapper();

BoolQueryBuilder query = boolQuery();
for(String term : keywordList)
query.should(fieldQuery("body", term));

long start = System.currentTimeMillis();
/** From here  */
SearchResponse response = client.prepareSearch(clusterName)
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(query).setFrom(0).setSize(limit).setExplain(true)
.execute()
.actionGet();
/** To Here takes 17 seconds in Java. */
SearchHit[] docs = response.getHits().getHits();
System.err.println("Query took  " + (System.currentTimeMillis()-start));
for(SearchHit doc : docs)
urlList.add((String)doc.getSource().get("url"));
LOG.info(String.format("Returning %d results", urlList.size()));

I wonder if you could point out what I am doing wrong?

Thanks in advance 

D.



--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Elasticsearch Java client much slower than rest call

DMAC
Hi,

Thanks. Sorry for the slow response. It turns out that it was my fault, it was the way I was serialising the data.

Regards

D.




On 14 Nov 2012, at 08:41, Derry O' Sullivan wrote:

2 other points on this.

1. I'm not sure what limit is (1200)? but returning that many values (versus returning the default of 10) makes a big difference
2. Are you doing exactly the same search in the REST call (e.g. DFS_QUERY_THEN_SEARCH search type, num results etc)?

We have done lots of testing with both http/rest with lots of search types/limits and i don't think i've every seen such a difference (or anything near that) in terms of timings. (using ES 0.19.9 over a multi-node cluster with millions of docs)

On Tuesday, 13 November 2012 19:30:53 UTC, DMAC wrote:
Hi 

We are starting to use the Java API in ElasticSearch. The only problem is that the queries seem to take much longer to retrieve data than simply using curl. 

Our development server(19.08) has very small index (2000 documents, with 8 fields) 

When making a call to retrieve ~1200 documents it takes 17 seconds to run a query 
versus < 1 second to get the same result using curl

Here is the code I am using to test ES

LOG.info(String.format("Initializing connection to ElasticSearch %s/%s on %d", host, clusterName, port));
settings = ImmutableSettings.settingsBuilder()
.put("client.transport.sniff",true).build();
client = new TransportClient(settings).addTransportAddress(new 
InetSocketTransportAddress(host, port));
searchRequest = client.prepareSearch(clusterName);
mapper = new SerObjectMapper();

BoolQueryBuilder query = boolQuery();
for(String term : keywordList)
query.should(fieldQuery("body", term));

long start = System.currentTimeMillis();
/** From here  */
SearchResponse response = client.prepareSearch(clusterName)
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(query).setFrom(0).setSize(limit).setExplain(true)
.execute()
.actionGet();
/** To Here takes 17 seconds in Java. */
SearchHit[] docs = response.getHits().getHits();
System.err.println("Query took  " + (System.currentTimeMillis()-start));
for(SearchHit doc : docs)
urlList.add((String)doc.getSource().get("url"));
LOG.info(String.format("Returning %d results", urlList.size()));

I wonder if you could point out what I am doing wrong?

Thanks in advance 

D.




--
 
 

--