Quantcast

Regarding upgrading Elastic Search server from 0.18.3

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Regarding upgrading Elastic Search server from 0.18.3

girish khadke
Hi,

We are using a 2 node Elastic search server cluster on JDK 6 (1.6.0_26) with 12 GB heap size.  (4 indexes * 5 shards / index    and    over 60GB of data / node for indexes)

We are observing performance problems where some of our elastic search queries.  These queries take time in order of minutes to execute per shard in elastic search.

Writes into elastic server are pretty low (probably 10 writes / second at max).

Reads load is higher than write load.

We were profiling elastic search and we observed that there are too many objects of char[], String, Term and TermInfo in heap during high load.  We were wondering if we can upgrade version of elastic search that has better memory consumption strategy and would not cause any problem with our existing set of data and cluster.

Current version of elastic search we are on is : 0.18.3     and    the questions that we have are:
1.   Should we upgrade to a major version change?  Will it cause problems?
2.   I see that 0.18.5 version has updates to Lucene index which improves memory consumption, should we use that?
3.   How do we upgrade a running production cluster without having a downtime (rolling upgrade)?
4.   Should we update JDK version too and tweak with heap settings also?

Let us know at earliest.


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Regarding upgrading Elastic Search server from 0.18.3

joergprante@gmail.com
Am 13.02.13 23:31, schrieb girish khadke:
> the questions that we have are:
> 1.   Should we upgrade to a major version change?  Will it cause problems?
It is always recommended to use the most recent version of Elasticsearch
due to bug fixes or performance improvements.

> 2.   I see that 0.18.5 version has updates to Lucene index which
> improves memory consumption, should we use that?

No. 0.18.5 contains an outdated Lucene 3.5, there is no reason to use
such an old Lucene version.
> 3.   How do we upgrade a running production cluster without having a
> downtime (rolling upgrade)?
In your case, I doubt you can. You should upgrade Java, Lucene, and ES,
all three to next major versions. Rolling upgrades are for situations
where you change minor JVM versions, updating to a minor ES version, or
change cluster/index configs within the same version.
> 4.   Should we update JDK version too and tweak with heap settings also?
Yes. I recommend the latest Java 7. Java 6 is no longer supported by
Oracle: http://www.oracle.com/technetwork/java/javase/eol-135779.html

Note, heap settings are not the only settings you should take care of.
There are a lot of filter/cache tunables. Without knowing your queries,
it is hard to tell more.

Best regards,

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Regarding upgrading Elastic Search server from 0.18.3

girish khadke
Our question is whether should we upgrade cluster directly from 0.18.3 to latest stable version 0.20.4  along with all other parameters like JDK.

We r trying to figure out cause of frequent brownouts in our production elastic search server environment.   During analysis we found out that, we do have problem of GC Hell happening under load and JVM fails to free memory using CMS collector and causing bigger GC pauses under load.

We tried out different heap size from our initial heap size of 2gb , 3gb and now 8gb of heap size given to elastic search.  But we suspect that we could have brownouts like this in future.

We have never done any optimizations at JVM level on our Elastic search server clusters.

Are there any good links on this ?? Any good advices on this?


On Wednesday, February 13, 2013 3:09:42 PM UTC-8, Jörg Prante wrote:
Am 13.02.13 23:31, schrieb girish khadke:
> the questions that we have are:
> 1.   Should we upgrade to a major version change?  Will it cause problems?
It is always recommended to use the most recent version of Elasticsearch
due to bug fixes or performance improvements.

> 2.   I see that 0.18.5 version has updates to Lucene index which
> improves memory consumption, should we use that?

No. 0.18.5 contains an outdated Lucene 3.5, there is no reason to use
such an old Lucene version.
> 3.   How do we upgrade a running production cluster without having a
> downtime (rolling upgrade)?
In your case, I doubt you can. You should upgrade Java, Lucene, and ES,
all three to next major versions. Rolling upgrades are for situations
where you change minor JVM versions, updating to a minor ES version, or
change cluster/index configs within the same version.
> 4.   Should we update JDK version too and tweak with heap settings also?
Yes. I recommend the latest Java 7. Java 6 is no longer supported by
Oracle: http://www.oracle.com/technetwork/java/javase/eol-135779.html

Note, heap settings are not the only settings you should take care of.
There are a lot of filter/cache tunables. Without knowing your queries,
it is hard to tell more.

Best regards,

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Regarding upgrading Elastic Search server from 0.18.3

joergprante@gmail.com
If you select smaller heap sizes, you can watch the heap development
quicker because GC happen earlier and more often.

Be aware that Java 6 was not developed with heaps larger than ~8GB in
mind, so there is a subtle barrier. I understand you have a 12GB heap
running. For Java 6, this is a challenge. Java 7 is designed for dealing
with larger heap sizes more easily.

Note, some older Java 6 JVMs have regressions, which are fixed in later
versions. If Java 7 is not an option, switch to the latest Java 6 JVM.

But, just by changing the JVM, you can't solve all the cases when the
capacity of a cluster is exhausted. There is a certain limit in each
cluster, and if your cluster resources are exhausted, you have to grow
your cluster.

Beside JVM tweaking - before changing parameters at places you are not
sure about, ensure yourself about what is the reason for the situation.
You can analyze the memory consumption also by using diagnostic messages
in your client to track down the issue: is it a facet/filter/cache
allocation problem? Or is it a challenge caused by badly written
queries? Or by mere query load? Without these facts, you can't expect a
true answer. Maybe you can tune queries in your app, or  maybe you can
configure caching right. Maybe you can mend the situation by just adding
more nodes, which is very easy in Elasticsearch.

Jörg

Am 14.02.13 01:49, schrieb girish khadke:

> Our question is whether should we upgrade cluster directly from 0.18.3
> to latest stable version 0.20.4  along with all other parameters like JDK.
>
> We r trying to figure out cause of frequent brownouts in our
> production elastic search server environment.   During analysis we
> found out that, we do have problem of GC Hell happening under load and
> JVM fails to free memory using CMS collector and causing bigger GC
> pauses under load.
>
> We tried out different heap size from our initial heap size of 2gb ,
> 3gb and now 8gb of heap size given to elastic search.  But we suspect
> that we could have brownouts like this in future.
>
> We have never done any optimizations at JVM level on our Elastic
> search server clusters.
>
> Are there any good links on this ?? Any good advices on this

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Regarding upgrading Elastic Search server from 0.18.3

Clinton Gormley-2
In reply to this post by girish khadke
> 3.   How do we upgrade a running production cluster without having a
> downtime (rolling upgrade)?

I wrote up an explanation of how we did a major upgrade without downtime
here:

https://gist.github.com/clintongormley/3888120

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Regarding upgrading Elastic Search server from 0.18.3

girish khadke
We are doing following query which is a range query and we apply sort by timestamp criteria:

EndUserTxnReportSearchCriteria criteria = new EndUserTxnReportSearchCriteria();   
criteria.setJurhash(getReqAccount().getJurHash());               
 for(String operation: operationList)
   criteria.addOperation(operation);
   criteria.setSortDir("DESC");
   criteria.setSortBy("Date");
    Date endDate = Calendar.getInstance(getUserTimeZone()).getTime();
    criteria.setEndDate(endDate);           
    Calendar startDateCal = Calendar.getInstance();
    startDateCal.add(Calendar.YEAR, -1);
    Date startDate = startDateCal.getTime();
    criteria.setStartDate(startDate);   

    QueryBuilder jh  = QueryBuilders.termQuery(ElasticSearchTransactionTypeUtil.Fields.account.toString(), criteria.getJurhash());
    BoolQueryBuilder boolQb = QueryBuilders.boolQuery().must(jh);
               
    // Add operation if present
    if (criteria.getOperations()!=null && !criteria.getOperations().isEmpty()){
       for(String operation: criteria.getOperations())
           boolQb.should(QueryBuilders.termQuery(ElasticSearchTransactionTypeUtil.Fields.operation.toString(),operation));
           boolQb.minimumNumberShouldMatch(1);
        }
       
        // Add userID to query
        if (criteria.getExtUserId()!=null && !criteria.getExtUserId().equals("")){
            //TODO: For demo, comment out wildcard search
            //QueryBuilder uid = QueryBuilders.wildcardQuery(ElasticSearchTransactionTypeUtil.Fields.user.toString(), "*" + criteria.getExtUserId().toLowerCase() + "*");
            QueryBuilder uid = QueryBuilders.wildcardQuery(ElasticSearchTransactionTypeUtil.Fields.user.toString(), "*" + criteria.getExtUserId().toLowerCase() + "*");
            boolQb = boolQb.must(uid);
        }       

        // Build query
        QueryBuilder qb = QueryBuilders.filteredQuery(boolQb,
                FilterBuilders.rangeFilter(ElasticSearchTransactionTypeUtil.Fields.timeStamp.toString())
                                              .from(criteria.getStartDate())
                                              .to(criteria.getEndDate())
                                              .includeLower(true)
                                              .includeUpper(false));


               
        // Get client
        TransportClient client = getClient();
        if(client != null){
        try{
        
           
            if(log.isDebugEnabled())
                log.debug("Query:" + new String(qb.buildAsBytes(XContentType.JSON)));
           
            // Get response
            SearchRequestBuilder requestBuilder= client.prepareSearch()
                                            .setOperationThreading(SearchOperationThreading.THREAD_PER_SHARD)
                                            .setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
                                            .setQuery(qb.buildAsBytes())   
                                            .setFrom(firstResult - 1)
                                            .setSize(pageSize)
                                            .setIndices(esIndices)
                                            .addSort(getSortBy(criteria.getSortBy()), getSortDir(criteria.getSortDir()))                                       
                                            .setExplain(false);
                               
     
           
            SearchResponse response =  requestBuilder.execute().actionGet(getTimeoutinmillis());
                                           
            if(log.isDebugEnabled())
                log.debug("SearchResponse:" + response.toString());
            List<EventLog> eventLogList = new ArrayList<EventLog>();
           
            try {           
                SearchHit[] hits =  (response==null || response.getHits()== null)?null:response.getHits().getHits();           
                if(hits!=null && hits.length > 0) {                 
                    EventLog eventLog;
                    for(int i=0;i<hits.length;i++){
                         Map<String,Object> map = hits[i].sourceAsMap();                     
                         eventLog = ElasticSearchTransactionTypeUtil.convertFromESDataToEventLog(map);
                         if(log.isDebugEnabled())
                             log.debug("Adding event log " + eventLog);
                         eventLogList.add(eventLog);
                    }
                    return eventLogList;
                }           
            } catch (Exception e) {
                log.error("Error while parsing the results from elastic search.");
                throw new RuntimeException(e);
            }
           
            return eventLogList;
        }
        finally{
            //The client should never close because of the client is singleton.
            //if(client!=null) client.close();
        }
        }
        return null;
    }

Timestamp is a very high cardinality field  (Almost unique and probably we have a lots and lots of such unique terms in our data).  I think sorting by timestamp is something that is causing problems with these queries..     When we search for last 2 years worth of data and do a search, the search just fails to give us back results within 20s timeout.   We face this issue intermittantly and we are trying to debug why is this happening.

Currently we use Elastic search 0.18.3 .    

Is there a better way of writing above query to get data for reporting?   Is there some functionality like limite() in Elastic Search Search API?

Looks like we also need to move to better version of lucene to improve on memory usage (lucene 3.6+)


Thanks and regards,
Girish Khadke

On Thursday, February 14, 2013 3:20:37 AM UTC-8, Clinton Gormley wrote:
> 3.   How do we upgrade a running production cluster without having a
> downtime (rolling upgrade)?

I wrote up an explanation of how we did a major upgrade without downtime
here:

https://gist.github.com/clintongormley/3888120

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Regarding upgrading Elastic Search server from 0.18.3

joergprante@gmail.com
Yes, I am quite familiar with that kind of requirement. Sorting values
on an inverse index is heavy. It is generating tabular data but must use
inversely indexed documents. With low cardinality and reasonable heap
size, it is often unoticed there is a challenge. A too high cardinality
of the field swamps the heap. And it is even more challenging in
situations when you just need top ranked documents, because the largest
part of sorting computation is wasted, it will not be used for
delivering results. I see you are fetching documents pagewise.

There are some options:

- reducing timestamp cardinality by creating buckets: maybe it is
possible to sort by month, week, day, hour, minute (and not by such fine
resolution like seconds or milliseconds)

- avoid sorting at all: boost the documents at indexing time, according
to their age, and use relevance scoring

- use time-based rolling indices to distribute the timestamps across
many indices

- precompute document order, put your documents in an index with static
pagecounters, so you can retrieve them page by page (if you have static
paging function)

- brute force: bring up more hardware (RAM) und increase the heap, and
continue to sort (even this strategy will cause delays when you exceed a
certain limit, it's around some dozens of GB, because loading values
into the heap for sort will take noticable time even when ES is
mlockall()'d )

You can't get around the issue just by updating to the latest
Elasticsearch or the latest JVM.

Jörg

Am 15.02.13 02:43, schrieb girish khadke:
> Timestamp is a very high cardinality field  (Almost unique and
> probably we have a lots and lots of such unique terms in our data).  I
> think sorting by timestamp is something that is causing problems with
> these queries..     When we search for last 2 years worth of data and
> do a search, the search just fails to give us back results within 20s
> timeout.   We face this issue intermittantly and we are trying to
> debug why is this happening.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Regarding upgrading Elastic Search server from 0.18.3

girish khadke
We are also wondering that, if larger GC pauses also could be the issue ?     Could just GC tuning would solve the problem?

Regards,
Girish

On Friday, February 15, 2013 12:22:38 AM UTC-8, Jörg Prante wrote:
Yes, I am quite familiar with that kind of requirement. Sorting values
on an inverse index is heavy. It is generating tabular data but must use
inversely indexed documents. With low cardinality and reasonable heap
size, it is often unoticed there is a challenge. A too high cardinality
of the field swamps the heap. And it is even more challenging in
situations when you just need top ranked documents, because the largest
part of sorting computation is wasted, it will not be used for
delivering results. I see you are fetching documents pagewise.

There are some options:

- reducing timestamp cardinality by creating buckets: maybe it is
possible to sort by month, week, day, hour, minute (and not by such fine
resolution like seconds or milliseconds)

- avoid sorting at all: boost the documents at indexing time, according
to their age, and use relevance scoring

- use time-based rolling indices to distribute the timestamps across
many indices

- precompute document order, put your documents in an index with static
pagecounters, so you can retrieve them page by page (if you have static
paging function)

- brute force: bring up more hardware (RAM) und increase the heap, and
continue to sort (even this strategy will cause delays when you exceed a
certain limit, it's around some dozens of GB, because loading values
into the heap for sort will take noticable time even when ES is
mlockall()'d )

You can't get around the issue just by updating to the latest
Elasticsearch or the latest JVM.

Jörg

Am 15.02.13 02:43, schrieb girish khadke:
> Timestamp is a very high cardinality field  (Almost unique and
> probably we have a lots and lots of such unique terms in our data).  I
> think sorting by timestamp is something that is causing problems with
> these queries..     When we search for last 2 years worth of data and
> do a search, the search just fails to give us back results within 20s
> timeout.   We face this issue intermittantly and we are trying to
> debug why is this happening.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Loading...