Default codec

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Default codec

aphalke
Hello Team,
   I am using elastic search 0.90.3. As per the documentation if I don't specify any codec mapping then default is taken. But what I have observed from yourkit memory snapshot is presence of objects of BloomFilterPostingFormat, which has delegate producer as BlockTreeTermsReader(which is default). So instead of default codec bloom_default codec is used. Is bloom_default is a default codec for each field?



Thanks,
Atul.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Default codec

Adrien Grand-2
Hi Atul,

Indeed, we don't exactly have the same defaults as Lucene and the difference is that we add a bloom filter to the _uid field. The reason why we do that is that the _uid field is unique in the index (by design) so having bloom filters on top of the terms dictionary makes _uid lookups very fast, which is important eg. for index requests since we need to check if there is already document with the same _uid in the index.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Default codec

aphalke
Thanks Adrien for the clarification. 
Suppose if we are good with the performance overhead incurred due to not using bloom filters. In this case do we have way to override this functionality and use default posting format for _uid too?  Reason I am asking this is, for our application in simulated environment around 48% of the memory is taken by BloomFilters. 

Thanks,
Atul.

On Monday, 28 October 2013 14:04:53 UTC+5:30, Adrien Grand wrote:
Hi Atul,

Indeed, we don't exactly have the same defaults as Lucene and the difference is that we add a bloom filter to the _uid field. The reason why we do that is that the _uid field is unique in the index (by design) so having bloom filters on top of the terms dictionary makes _uid lookups very fast, which is important eg. for index requests since we need to check if there is already document with the same _uid in the index.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Default codec

aphalke
Hello Adrien,Team,
    Do we have way to override the fuctionality and use default posting format for _uid too?
Thanks in Advance. 
Regards,
Atul.

On Monday, 28 October 2013 17:04:48 UTC+5:30, Atul Phalke wrote:
Thanks Adrien for the clarification. 
Suppose if we are good with the performance overhead incurred due to not using bloom filters. In this case do we have way to override this functionality and use default posting format for _uid too?  Reason I am asking this is, for our application in simulated environment around 48% of the memory is taken by BloomFilters. 

Thanks,
Atul.

On Monday, 28 October 2013 14:04:53 UTC+5:30, Adrien Grand wrote:
Hi Atul,

Indeed, we don't exactly have the same defaults as Lucene and the difference is that we add a bloom filter to the _uid field. The reason why we do that is that the _uid field is unique in the index (by design) so having bloom filters on top of the terms dictionary makes _uid lookups very fast, which is important eg. for index requests since we need to check if there is already document with the same _uid in the index.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Default codec

prasanna sivanandam
In reply to this post by Adrien Grand-2
Adrien,

We are indexing read only data using ES. So we won't do any update on the indexed data. Is it possible to avoid storing _uid field in the indexes.

Prasanna

On Monday, October 28, 2013 2:04:53 PM UTC+5:30, Adrien Grand wrote:
Hi Atul,

Indeed, we don't exactly have the same defaults as Lucene and the difference is that we add a bloom filter to the _uid field. The reason why we do that is that the _uid field is unique in the index (by design) so having bloom filters on top of the terms dictionary makes _uid lookups very fast, which is important eg. for index requests since we need to check if there is already document with the same _uid in the index.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Default codec

simonw-2
you actually can configure the lucene default for this field. you need to define a customer posting format like this:

curl -XPUT 'http://localhost:9200/indexname/' -d '{
   
"settings" : {
       
"index" : {
           
"codec" : {
         
"postings_format" : {
             
"default_no_bloom" : {
               
"type" : "default"
              }
         
}
       
}
       
}
   
}
}'
And then use it in the mapping like this:

{
 
"type" : {
     
"_uid" : {
         
"postings_format" : "default_no_bloom"
     }
 
}
}
That should be it

simon

On Monday, November 11, 2013 8:51:29 AM UTC+1, prasanna wrote:
Adrien,

We are indexing read only data using ES. So we won't do any update on the indexed data. Is it possible to avoid storing _uid field in the indexes.

Prasanna

On Monday, October 28, 2013 2:04:53 PM UTC+5:30, Adrien Grand wrote:
Hi Atul,

Indeed, we don't exactly have the same defaults as Lucene and the difference is that we add a bloom filter to the _uid field. The reason why we do that is that the _uid field is unique in the index (by design) so having bloom filters on top of the terms dictionary makes _uid lookups very fast, which is important eg. for index requests since we need to check if there is already document with the same _uid in the index.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Default codec

Anantha Govindarajan
Hi ,

I have added  index.codec.postings_format.my_format.type:default  in elasticsearch.yml and  "_uid" : {"postings_format" : "my_format"} in my default-mapping.json file.

Still _es090_0.blm files getting created. How do i achieve default posting format for _uid field.

Atul , We are facing the same problem , could you solve the issue ? any alternatives ?  If so help me to resolve this case.



--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Default codec

aphalke
Hi Simon, 
          Thanks for the reply. As Anantha mentioned its not working for _uid field. After analysis looks like bloom filter codec is default for _uid field and we can not overwrite that. 
Anantha,
     We were trying to resolve this issue to reduce the memory footprint. As a alternative we are thinking of keeping open only limited number of indices.   We have to dynamically open and close indices depending upon search request. Do you have any other option other than this?

Thanks,
Atul.


On Tuesday, 26 November 2013 15:01:40 UTC+5:30, Anantha Govindarajan wrote:
Hi ,

I have added  index.codec.postings_format.my_format.type:default  in elasticsearch.yml and  "_uid" : {"postings_format" : "my_format"} in my default-mapping.json file.

Still _es090_0.blm files getting created. How do i achieve default posting format for _uid field.

Atul , We are facing the same problem , could you solve the issue ? any alternatives ?  If so help me to resolve this case.



--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ecb229d7-fef6-4e32-8c17-15aa3e6e28cd%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Default codec

Anantha Govindarajan
Hi Atul ,

Thanks for your reply . Actually it works nicely. We need to verify like that by using luke tool. Kindly go through the following post ,

https://groups.google.com/forum/#!searchin/elasticsearch/luke/elasticsearch/Oi3PQqgFphQ/xoRfb54DjrQJ.

I went through the above link and constructed the jar , which i have attached , you can make use of it.

In luke tool -> Commits tab , see (A)ttributes,D,C,F infos of selected field where it shows all the available fields posting format . By default it show es090 format for all fields though bloom filter applied for _uid field alone. So my advice is,
  • Dont change the 2 configurations which i mentioned, run as usually verify luke tool - there will be es090 posting format for all fields
  • Now add 2 conf and change my_format for all fields now verify - there wont be no es090 files in your index.
Thanks Simon for your advice. 


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/aadcd86b-7c47-4440-9b73-d8c8ab71056c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Default codec

simonw-2
thanks for reporting back!

On Friday, November 29, 2013 1:48:57 PM UTC+1, Anantha Govindarajan wrote:
Hi Atul ,

Thanks for your reply . Actually it works nicely. We need to verify like that by using luke tool. Kindly go through the following post ,

<a href="https://groups.google.com/forum/#!searchin/elasticsearch/luke/elasticsearch/Oi3PQqgFphQ/xoRfb54DjrQJ" target="_blank" onmousedown="this.href='https://groups.google.com/forum/#!searchin/elasticsearch/luke/elasticsearch/Oi3PQqgFphQ/xoRfb54DjrQJ';return true;" onclick="this.href='https://groups.google.com/forum/#!searchin/elasticsearch/luke/elasticsearch/Oi3PQqgFphQ/xoRfb54DjrQJ';return true;">https://groups.google.com/forum/#!searchin/elasticsearch/luke/elasticsearch/Oi3PQqgFphQ/xoRfb54DjrQJ.

I went through the above link and constructed the jar , which i have attached , you can make use of it.

In luke tool -> Commits tab , see (A)ttributes,D,C,F infos of selected field where it shows all the available fields posting format . By default it show es090 format for all fields though bloom filter applied for _uid field alone. So my advice is,
  • Dont change the 2 configurations which i mentioned, run as usually verify luke tool - there will be es090 posting format for all fields
  • Now add 2 conf and change my_format for all fields now verify - there wont be no es090 files in your index.
Thanks Simon for your advice. 


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/581cbd9b-e133-4db1-8ee1-f4b358663a21%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.