What does it mean to "store" a field?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

What does it mean to "store" a field?

Nick Hoffman
Throughout the documentation on the website, the "store" option is mentioned. Eg:

"The field is stored in the index"
"Set to yes the store actual field in the index, no to not store it."

What are the consequences of storing, or not storing, a field in the index?

My guess is that an unstored field can't be queried, but will be returned when retrieving the document.

Thanks guys.
Reply | Threaded
Open this post in threaded view
|

Re: What does it mean to "store" a field?

dadoonet
When you *index* a field, you can search it.
If you store it, you can display the content of this field if your document matches.

But, if you store the whole document (source), you can also display it.

So an unstored field can be queried but can not be displayed if you have also disabled source.

This how I understand it.

Correct me if I'm wrong...

HTH
David ;-)
@dadoonet


Le 16 nov. 2011 à 23:43, Nick Hoffman <[hidden email]> a écrit :

Throughout the documentation on the website, the "store" option is mentioned. Eg:

"The field is stored in the index"
"Set to yes the store actual field in the index, no to not store it."

What are the consequences of storing, or not storing, a field in the index?

My guess is that an unstored field can't be queried, but will be returned when retrieving the document.

Thanks guys.
Reply | Threaded
Open this post in threaded view
|

Re: What does it mean to "store" a field?

Andy-3
Great question, I'd like to get a good understanding of this too.  One
thing is,  I think if your going to highlight fields, then it is
better to store them too, so that they don't have to parsed from the
source for the highlighting.  That atleast was my impression after
reading http://www.elasticsearch.org/guide/reference/api/search/highlighting.html

On Nov 16, 7:17 pm, David Pilato <[hidden email]> wrote:

> When you *index* a field, you can search it.
> If you store it, you can display the content of this field if your document matches.
>
> But, if you store the whole document (source), you can also display it.
>
> So an unstored field can be queried but can not be displayed if you have also disabled source.
>
> This how I understand it.
>
> Correct me if I'm wrong...
>
> HTH
> David ;-)
> @dadoonet
>
> Le 16 nov. 2011 à 23:43, Nick Hoffman <[hidden email]> a écrit :
>
>
>
>
>
>
>
> > Throughout the documentation on the website, the "store" option is mentioned. Eg:
>
> > "The field is stored in the index"
> > "Set to yes the store actual field in the index, no to not store it."
> >http://www.elasticsearch.org/guide/reference/mapping/core-types.html
>
> > What are the consequences of storing, or not storing, a field in the index?
>
> > My guess is that an unstored field can't be queried, but will be returned when retrieving the document.
>
> > Thanks guys.
Reply | Threaded
Open this post in threaded view
|

Re: What does it mean to "store" a field?

Nick Hoffman
In reply to this post by dadoonet
Cool, that makes sense. Where/how does one configure whether or not the whole document is stored? I looked around on the ES website, but couldn't find this detail.
Reply | Threaded
Open this post in threaded view
|

Re: What does it mean to "store" a field?

dadoonet
http://www.elasticsearch.org/guide/reference/mapping/source-field.html

David ;-)
@dadoonet


Le 17 nov. 2011 à 05:57, Nick Hoffman <[hidden email]> a écrit :

> Cool, that makes sense. Where/how does one configure whether or not the whole document is stored? I looked around on the ES website, but couldn't find this detail.
Reply | Threaded
Open this post in threaded view
|

Re: What does it mean to "store" a field?

kimchy
Administrator
Heya,

   By default in elasticsearch, the _source (the document one indexed) is stored. This means when you search, you can get the actual document source back. Moreover, elasticsearch will automatically extract fields / objects from the _source and return them if you explicitly ask for it (as well as possibly use it in other components, like highlighting).

   You can specify that a specific field is also stored. This menas that the data for that field will be stored "on its own". Meaning that if you ask for "field1" (which is stored), elasticsearch will identify that its stored, and load it from the index instead of getting it from the _source (assuming _source is enabled).

   When do you want to enable storing specific fields? Most times, you don't. Fetching the _source is fast and extracting it is fast as well. If you have very large documents, where the cost of storing the _source, or the cost of parsing the _source is high, you can explicitly map some fields to be stored instead.

   Note, there is a cost of retrieving each stored field. So, for example, if you have a json with 10 fields with reasonable size, and you map all of them as stored, and ask for all of them, this means loading each one (more disk seeks), compared to just loading the _source (which is one field, possibly compressed).

-shay.banon

On Thu, Nov 17, 2011 at 7:51 AM, David Pilato <[hidden email]> wrote:
http://www.elasticsearch.org/guide/reference/mapping/source-field.html

David ;-)
@dadoonet


Le 17 nov. 2011 à 05:57, Nick Hoffman <[hidden email]> a écrit :

> Cool, that makes sense. Where/how does one configure whether or not the whole document is stored? I looked around on the ES website, but couldn't find this detail.

Reply | Threaded
Open this post in threaded view
|

Re: What does it mean to "store" a field?

Nick Hoffman
In reply to this post by dadoonet
Ah, perfect! Thanks, David.
Reply | Threaded
Open this post in threaded view
|

Re: What does it mean to "store" a field?

Nick Hoffman
In reply to this post by kimchy
Great explanation, Shay. Thanks for taking the time to write that. It really clears up all of my questions.
Reply | Threaded
Open this post in threaded view
|

Re: What does it mean to "store" a field?

Ray Ward
In reply to this post by kimchy
>    When do you want to enable storing specific fields? Most times, you
> don't. Fetching the _source is fast and extracting it is fast as well. If
> you have very large documents, where the cost of storing the _source, or
> the cost of parsing the _source is high, you can explicitly map some fields
> to be stored instead.
>

So if you are going to store the source (which is on by default) then
you shouldn't store individual fields as it offers no real advantage.
And if you have very large documents where you may be searching
multiple fields, but only need certain fields returned in the hit,
then you may choose to disable storing the source and then you store
individual fields instead.

Is this the intention?
Reply | Threaded
Open this post in threaded view
|

Re: What does it mean to "store" a field?

kimchy
Administrator
Almost, sometimes it also make sense to store the _source and store other fields specifically. If you need hte _source now and then, for example, in order to reindex, but it can be quite big, so not to pay the price of loading and possibly parsing it, just fetch specific stored fields.

On Mon, Dec 5, 2011 at 2:21 AM, Ray Ward <[hidden email]> wrote:
>    When do you want to enable storing specific fields? Most times, you
> don't. Fetching the _source is fast and extracting it is fast as well. If
> you have very large documents, where the cost of storing the _source, or
> the cost of parsing the _source is high, you can explicitly map some fields
> to be stored instead.
>

So if you are going to store the source (which is on by default) then
you shouldn't store individual fields as it offers no real advantage.
And if you have very large documents where you may be searching
multiple fields, but only need certain fields returned in the hit,
then you may choose to disable storing the source and then you store
individual fields instead.

Is this the intention?

Reply | Threaded
Open this post in threaded view
|

Re: What does it mean to "store" a field?

vineeth mohan
So let me get this idea right.
If store is yes and _source is enabled , is 2 copies of the same field stored in ES ?
Also is compression applied to "stored" field too ?

I am seeing all possible ways to reduce size of  storage without sacrificing the high-lightening functionality.
If there is any other good option apart from the aforementioned let me know ...

Thanks
          Vineeth

On Tue, Dec 6, 2011 at 12:52 AM, Shay Banon <[hidden email]> wrote:
Almost, sometimes it also make sense to store the _source and store other fields specifically. If you need hte _source now and then, for example, in order to reindex, but it can be quite big, so not to pay the price of loading and possibly parsing it, just fetch specific stored fields.


On Mon, Dec 5, 2011 at 2:21 AM, Ray Ward <[hidden email]> wrote:
>    When do you want to enable storing specific fields? Most times, you
> don't. Fetching the _source is fast and extracting it is fast as well. If
> you have very large documents, where the cost of storing the _source, or
> the cost of parsing the _source is high, you can explicitly map some fields
> to be stored instead.
>

So if you are going to store the source (which is on by default) then
you shouldn't store individual fields as it offers no real advantage.
And if you have very large documents where you may be searching
multiple fields, but only need certain fields returned in the hit,
then you may choose to disable storing the source and then you store
individual fields instead.

Is this the intention?


Reply | Threaded
Open this post in threaded view
|

RE: What does it mean to "store" a field?

dadoonet

Hi,

 

 

Just have about the same question regarding highlighting of attachments.

 

It seems that if you disable source for documents with attachment field, you can’t highlight them, even if you mark the field attachment to be stored.

Documentation is saying :

In order to perform highlighting, the actual content of the field is required. If the field in question is stored (has store set to yes in the mapping), it will be used, otherwise, the actual_source will be loaded and the relevant field will be extracted from it.

 

@Shay, can you confirm that or should I make more tests to find how to do it ?

 

 

 

David.

 

 

De : [hidden email] [mailto:[hidden email]] De la part de Vineeth Mohan
Envoyé : vendredi 17 février 2012 04:09
À : [hidden email]
Objet : Re: What does it mean to "store" a field?

 

So let me get this idea right.
If store is yes and _source is enabled , is 2 copies of the same field stored in ES ?
Also is compression applied to "stored" field too ?

I am seeing all possible ways to reduce size of  storage without sacrificing the high-lightening functionality.
If there is any other good option apart from the aforementioned let me know ...

Thanks
          Vineeth

 

Reply | Threaded
Open this post in threaded view
|

Re: What does it mean to "store" a field?

vineeth mohan
I tired some tests on, following are the index size with the below mentioned tests.
PS - The test i used as base 64 of some binary data

With

Store=yes , _source=enable  ------  1.3 MB
Store=No , _source=disable  ------- 1 MB
Store=NO , _source=disable - 600 KB
Store=No,_source=enable,compress=on - 900 KB

So i come to conclusion that by default the same string is stored twice.

Thanks
          Vineeth


On Fri, Feb 17, 2012 at 1:42 PM, David Pilato <[hidden email]> wrote:

Hi,

 

 

Just have about the same question regarding highlighting of attachments.

 

It seems that if you disable source for documents with attachment field, you can’t highlight them, even if you mark the field attachment to be stored.

Documentation is saying :

In order to perform highlighting, the actual content of the field is required. If the field in question is stored (has store set to yes in the mapping), it will be used, otherwise, the actual_source will be loaded and the relevant field will be extracted from it.

 

@Shay, can you confirm that or should I make more tests to find how to do it ?

 

 

 

David.

 

 

De : [hidden email] [mailto:[hidden email]] De la part de Vineeth Mohan
Envoyé : vendredi 17 février 2012 04:09
À : [hidden email]
Objet : Re: What does it mean to "store" a field?

 

So let me get this idea right.
If store is yes and _source is enabled , is 2 copies of the same field stored in ES ?
Also is compression applied to "stored" field too ?

I am seeing all possible ways to reduce size of  storage without sacrificing the high-lightening functionality.
If there is any other good option apart from the aforementioned let me know ...

Thanks
          Vineeth

 


Reply | Threaded
Open this post in threaded view
|

Re: What does it mean to "store" a field?

kimchy
Administrator
In reply to this post by dadoonet
Yes, when the field is stored, then it will be used for highlighting. Thats the behavior.

On Friday, February 17, 2012 at 10:12 AM, David Pilato wrote:

Hi,

 

 

Just have about the same question regarding highlighting of attachments.

 

It seems that if you disable source for documents with attachment field, you can’t highlight them, even if you mark the field attachment to be stored.

Documentation is saying :

In order to perform highlighting, the actual content of the field is required. If the field in question is stored (has store set to yes in the mapping), it will be used, otherwise, the actual_source will be loaded and the relevant field will be extracted from it.

 

@Shay, can you confirm that or should I make more tests to find how to do it ?

 

 

 

David.

 

 

De : [hidden email] [[hidden email]] De la part de Vineeth Mohan
Envoyé : vendredi 17 février 2012 04:09
À : [hidden email]
Objet : Re: What does it mean to "store" a field?

 

So let me get this idea right.
If store is yes and _source is enabled , is 2 copies of the same field stored in ES ?
Also is compression applied to "stored" field too ?

I am seeing all possible ways to reduce size of  storage without sacrificing the high-lightening functionality.
If there is any other good option apart from the aforementioned let me know ...

Thanks
          Vineeth

 


Reply | Threaded
Open this post in threaded view
|

Re: What does it mean to "store" a field?

Saurabh-3
In reply to this post by kimchy
Is there and way to mark fields as stored using JAVA API

On Thursday, 17 November 2011 18:49:20 UTC+5:30, kimchy wrote:
Heya,

   By default in elasticsearch, the _source (the document one indexed) is stored. This means when you search, you can get the actual document source back. Moreover, elasticsearch will automatically extract fields / objects from the _source and return them if you explicitly ask for it (as well as possibly use it in other components, like highlighting).

   You can specify that a specific field is also stored. This menas that the data for that field will be stored "on its own". Meaning that if you ask for "field1" (which is stored), elasticsearch will identify that its stored, and load it from the index instead of getting it from the _source (assuming _source is enabled).

   When do you want to enable storing specific fields? Most times, you don't. Fetching the _source is fast and extracting it is fast as well. If you have very large documents, where the cost of storing the _source, or the cost of parsing the _source is high, you can explicitly map some fields to be stored instead.

   Note, there is a cost of retrieving each stored field. So, for example, if you have a json with 10 fields with reasonable size, and you map all of them as stored, and ask for all of them, this means loading each one (more disk seeks), compared to just loading the _source (which is one field, possibly compressed).

-shay.banon

On Thu, Nov 17, 2011 at 7:51 AM, David Pilato <[hidden email]> wrote:
http://www.elasticsearch.org/guide/reference/mapping/source-field.html

David ;-)
@dadoonet


Le 17 nov. 2011 à 05:57, Nick Hoffman <[hidden email]> a écrit :

> Cool, that makes sense. Where/how does one configure whether or not the whole document is stored? I looked around on the ES website, but couldn't find this detail.

Reply | Threaded
Open this post in threaded view
|

Re: What does it mean to "store" a field?

Saurabh-3
In reply to this post by kimchy
Also how to mark _source as not stored?

On Thursday, 17 November 2011 18:49:20 UTC+5:30, kimchy wrote:
Heya,

   By default in elasticsearch, the _source (the document one indexed) is stored. This means when you search, you can get the actual document source back. Moreover, elasticsearch will automatically extract fields / objects from the _source and return them if you explicitly ask for it (as well as possibly use it in other components, like highlighting).

   You can specify that a specific field is also stored. This menas that the data for that field will be stored "on its own". Meaning that if you ask for "field1" (which is stored), elasticsearch will identify that its stored, and load it from the index instead of getting it from the _source (assuming _source is enabled).

   When do you want to enable storing specific fields? Most times, you don't. Fetching the _source is fast and extracting it is fast as well. If you have very large documents, where the cost of storing the _source, or the cost of parsing the _source is high, you can explicitly map some fields to be stored instead.

   Note, there is a cost of retrieving each stored field. So, for example, if you have a json with 10 fields with reasonable size, and you map all of them as stored, and ask for all of them, this means loading each one (more disk seeks), compared to just loading the _source (which is one field, possibly compressed).

-shay.banon

On Thu, Nov 17, 2011 at 7:51 AM, David Pilato <[hidden email]> wrote:
http://www.elasticsearch.org/guide/reference/mapping/source-field.html

David ;-)
@dadoonet


Le 17 nov. 2011 à 05:57, Nick Hoffman <[hidden email]> a écrit :

> Cool, that makes sense. Where/how does one configure whether or not the whole document is stored? I looked around on the ES website, but couldn't find this detail.

Reply | Threaded
Open this post in threaded view
|

Re: What does it mean to "store" a field?

Ivan Brusic
You can specify how source is handled (stored,not-stored,compressed)
in the  mapping:

http://www.elasticsearch.org/guide/reference/mapping/source-field.html

On Wed, Jun 13, 2012 at 4:19 AM, Saurabh <[hidden email]> wrote:

> Also how to mark _source as not stored?
>
>
> On Thursday, 17 November 2011 18:49:20 UTC+5:30, kimchy wrote:
>>
>> Heya,
>>
>>    By default in elasticsearch, the _source (the document one indexed) is
>> stored. This means when you search, you can get the actual document source
>> back. Moreover, elasticsearch will automatically extract fields / objects
>> from the _source and return them if you explicitly ask for it (as well as
>> possibly use it in other components, like highlighting).
>>
>>    You can specify that a specific field is also stored. This menas that
>> the data for that field will be stored "on its own". Meaning that if you ask
>> for "field1" (which is stored), elasticsearch will identify that its stored,
>> and load it from the index instead of getting it from the _source (assuming
>> _source is enabled).
>>
>>    When do you want to enable storing specific fields? Most times, you
>> don't. Fetching the _source is fast and extracting it is fast as well. If
>> you have very large documents, where the cost of storing the _source, or the
>> cost of parsing the _source is high, you can explicitly map some fields to
>> be stored instead.
>>
>>    Note, there is a cost of retrieving each stored field. So, for example,
>> if you have a json with 10 fields with reasonable size, and you map all of
>> them as stored, and ask for all of them, this means loading each one (more
>> disk seeks), compared to just loading the _source (which is one field,
>> possibly compressed).
>>
>> -shay.banon
>>
>> On Thu, Nov 17, 2011 at 7:51 AM, David Pilato <[hidden email]> wrote:
>>>
>>> http://www.elasticsearch.org/guide/reference/mapping/source-field.html
>>>
>>> David ;-)
>>> @dadoonet
>>>
>>>
>>> Le 17 nov. 2011 à 05:57, Nick Hoffman <[hidden email]> a écrit :
>>>
>>> > Cool, that makes sense. Where/how does one configure whether or not the
>>> > whole document is stored? I looked around on the ES website, but couldn't
>>> > find this detail.
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: What does it mean to "store" a field?

Prashant Agrawal
In reply to this post by kimchy
Hi All,

Here I have a question as well for the same with respect to below scenario.

1) Say I am indexing a huge amount of data (say 500GB) per day.
2) It contains about 15 fields including attachments as well.
3) As per current arch _source is set as default (enabled).
4) Almost all fields are set as store true as well.
5) Now I am firing two type of query:
a) Retrieving docs on the basis of id which will give complete source and
b) On the basis of some fields say 4 or 5 fields.

So considering the above query for retrieval and amount of data, does it really make sense to store the source as well as individual field. Because ES internally returns the data as field as well even if it is not stored (but _source is stored).

As in some of the post I read retrieval from fields is faster in case we have less fields and less data. But what in case of large data, retrieving from fields is faster or retrieving from _source.

Please confirm for the same.

~Prashant
Reply | Threaded
Open this post in threaded view
|

Re: What does it mean to "store" a field?

Nikolas Everett
My understanding is that it is mostly more efficient to not store any fields and just let Elasticsearch load them from the source when needed.

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1jz8QDjd074xhJm2bDetVH4svzSbWc7970oCYWQw6wOg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: What does it mean to "store" a field?

joergprante@gmail.com
Loading from _source, especially in scripts, is slow, and uses additional memory. Also note that large _source field is compressed, which adds another bit of CPU overhead.

With stored fields, you have finer control over these issues. So it might make sense to compress large binary data fields http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#binary

Jörg

On Mon, Sep 22, 2014 at 3:41 PM, Nikolas Everett <[hidden email]> wrote:
My understanding is that it is mostly more efficient to not store any fields and just let Elasticsearch load them from the source when needed.

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1jz8QDjd074xhJm2bDetVH4svzSbWc7970oCYWQw6wOg%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG5oynqESWuK3DvsFsebhpqb_jj2uF0o4_GUqKEhuQBVA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
12