doc(), source() and "stored" property

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

doc(), source() and "stored" property

Vadim Voituk
Hello, 

I have an index of middle-size documents (100Kb of index size/document).
Each of these document have number of "filterable" values. Let's call it "rules list".
Almost each query doing filtering using this "rules list".
The filtering is implemented with native script.
Also the fields of "rules list" marked as "stored".

So, in this particular case it's better to have these rules in document, but not in "source", and even not to load "source" into memory during filtering.

Since i've marked "rules" fields as stored, i'm expecting these values should be available in native script in a way:

    doc().field("rules")

But it's not. And "rules" are available only via 

    source().get("rules")
    
I guess in this case the source should be loaded and parsed on a filtering phase, and it's not effective at all.

Should it works as i expected or i'm doing wrong assumption?






Reply | Threaded
Open this post in threaded view
|

Re: doc(), source() and "stored" property

kimchy
Administrator
When you call doc(), it will go to the field cache, it means that the indexed terms for the field are loaded into memory and provided. When you use source(), it will load the source and parse it, when you use fields(), it will load a stored field . Loading from the index (either specifically stored fields or source) is slow, its recommended you use the doc part.

On Tue, May 29, 2012 at 6:14 PM, Vadim Voituk <[hidden email]> wrote:
Hello, 

I have an index of middle-size documents (100Kb of index size/document).
Each of these document have number of "filterable" values. Let's call it "rules list".
Almost each query doing filtering using this "rules list".
The filtering is implemented with native script.
Also the fields of "rules list" marked as "stored".

So, in this particular case it's better to have these rules in document, but not in "source", and even not to load "source" into memory during filtering.

Since i've marked "rules" fields as stored, i'm expecting these values should be available in native script in a way:

    doc().field("rules")

But it's not. And "rules" are available only via 

    source().get("rules")
    
I guess in this case the source should be loaded and parsed on a filtering phase, and it's not effective at all.

Should it works as i expected or i'm doing wrong assumption?







Reply | Threaded
Open this post in threaded view
|

Re: doc(), source() and "stored" property

Vadim Voituk
Hello, Shay

Thanks for your answer, that's exactly what i want to do.
But the problem here, that the value i need is not present in doc().

I'm getting this exception when trying to access "stored" field via doc().field("variants") or doc().get("variants")
("Variants" - it's a name for "rules list" from my initial question)

Caused by: org.elasticsearch.ElasticSearchIllegalArgumentException: No field found for [variants]
        at org.elasticsearch.search.lookup.DocLookup.get(DocLookup.java:110)
        at org.elasticsearch.search.lookup.DocLookup.field(DocLookup.java:87)
        at com.voituk.VariantsFilterScriptFactory$VariantsFilterScript.run(VariantsFilterScriptFactory.java:57)

And here is my _mapping for "variants" field (it's collection).

 {
       "properties": {
           "variants": {
               "properties": {
                   "id": {"type": "integer"},
                   "filter1": {"type": "integer","store": "yes","index": "not_analyzed"},
                   "filter2": {"type": "integer","store": "yes","index": "not_analyzed"},
                   "filter3": {"type": "string","store": "yes","index": "not_analyzed"},
                   "filter4": {"type": "integer","store": "yes","index": "not_analyzed"}
               }
           }
       }
    }
}

Why it's not available via doc()? What i'm doing wrong here?


-- 
Vadim



On Wednesday, May 30, 2012 10:51:48 AM UTC+2, kimchy wrote:
When you call doc(), it will go to the field cache, it means that the indexed terms for the field are loaded into memory and provided. When you use source(), it will load the source and parse it, when you use fields(), it will load a stored field . Loading from the index (either specifically stored fields or source) is slow, its recommended you use the doc part.


Reply | Threaded
Open this post in threaded view
|

Re: doc(), source() and "stored" property

kimchy
Administrator
variants is an object, you can only access specific fields in doc() (from field cache), or fields (from specific stored fields), something like variants.id.

On Wed, May 30, 2012 at 11:09 AM, Vadim Voituk <[hidden email]> wrote:
Hello, Shay

Thanks for your answer, that's exactly what i want to do.
But the problem here, that the value i need is not present in doc().

I'm getting this exception when trying to access "stored" field via doc().field("variants") or doc().get("variants")
("Variants" - it's a name for "rules list" from my initial question)

Caused by: org.elasticsearch.ElasticSearchIllegalArgumentException: No field found for [variants]
        at org.elasticsearch.search.lookup.DocLookup.get(DocLookup.java:110)
        at org.elasticsearch.search.lookup.DocLookup.field(DocLookup.java:87)
        at com.voituk.VariantsFilterScriptFactory$VariantsFilterScript.run(VariantsFilterScriptFactory.java:57)

And here is my _mapping for "variants" field (it's collection).

 {
       "properties": {
           "variants": {
               "properties": {
                   "id": {"type": "integer"},
                   "filter1": {"type": "integer","store": "yes","index": "not_analyzed"},
                   "filter2": {"type": "integer","store": "yes","index": "not_analyzed"},
                   "filter3": {"type": "string","store": "yes","index": "not_analyzed"},
                   "filter4": {"type": "integer","store": "yes","index": "not_analyzed"}
               }
           }
       }
    }
}

Why it's not available via doc()? What i'm doing wrong here?


-- 
Vadim



On Wednesday, May 30, 2012 10:51:48 AM UTC+2, kimchy wrote:
When you call doc(), it will go to the field cache, it means that the indexed terms for the field are loaded into memory and provided. When you use source(), it will load the source and parse it, when you use fields(), it will load a stored field . Loading from the index (either specifically stored fields or source) is slow, its recommended you use the doc part.



Reply | Threaded
Open this post in threaded view
|

Re: doc(), source() and "stored" property

Vadim Voituk
On Wednesday, May 30, 2012 10:51:48 AM UTC+2, kimchy wrote:
When you call doc(), it will go to the field cache, it means that the indexed terms for the field are loaded into memory and provided. When you use source(), it will load the source and parse it, when you use fields(), it will load a stored field . Loading from the index (either specifically stored fields or source) is slow, its recommended you use the doc part.


Thanks, Shanon, much more clear for me.

To be precise, the "variants" it's not an object, but the list/array of objects.
Something like: 


{
  //... other fields ...
  "variants": [
    {
      title: "Variant #1",
      filter1: "... some value of filter #1",
      filter2: "... some value of filter #2",
      filter3: "... some value of filter #3",
    },

    {
      title: "Variant #2",
      filter1: "... another value of filter #1",
      filter2: "... another value of filter #2",
      filter3: "... another value of filter #3",
    },
    // more variants there...
  ],
  //... other fields ...
}

And as i got from my ES-core investigations, if i'll do the "request" like

    doc().fields("variants.filter1") 

i'll get the list of all unique values of "filter1" field within "variants" array.


So the question is - is it possible to make this list of objects to be accessible without loading and parsing of entire document source? 

Looking forward for any feedback or thoughts about this.

Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: doc(), source() and "stored" property

Vadim Voituk
Sorry for popping this up this, but i'm afraid that my last question was lost in a huge description :)

Here it is: 
Is it possible to make the list of document's sub-objects stored together with index (or load in memory) to use in inside native filter without loading and parsing of entire source object.

Here is how it looks now in native script Java code:

@Object public Object run() 
    final List<Map> vars = (List<Map>) source().get("variants");
    for(Map var: vars) {
        // Navigate through and process filters one by one
    }

The problem here is source().get("variants");  call - it's very slow because the "source" itself is about 100Kb.

Any suggestions how to put "variants" into doc() or memory?


On Monday, June 4, 2012 10:20:33 AM UTC+2, Vadim Voituk wrote:
On Wednesday, May 30, 2012 10:51:48 AM UTC+2, kimchy wrote:
When you call doc(), it will go to the field cache, it means that the indexed terms for the field are loaded into memory and provided. When you use source(), it will load the source and parse it, when you use fields(), it will load a stored field . Loading from the index (either specifically stored fields or source) is slow, its recommended you use the doc part.


Thanks, Shanon, much more clear for me.

To be precise, the "variants" it's not an object, but the list/array of objects.
Something like: 


{
  //... other fields ...
  "variants": [
    {
      title: "Variant #1",
      filter1: "... some value of filter #1",
      filter2: "... some value of filter #2",
      filter3: "... some value of filter #3",
    },

    {
      title: "Variant #2",
      filter1: "... another value of filter #1",
      filter2: "... another value of filter #2",
      filter3: "... another value of filter #3",
    },
    // more variants there...
  ],
  //... other fields ...
}

And as i got from my ES-core investigations, if i'll do the "request" like

    doc().fields("variants.filter1") 

i'll get the list of all unique values of "filter1" field within "variants" array.


So the question is - is it possible to make this list of objects to be accessible without loading and parsing of entire document source? 

Looking forward for any feedback or thoughts about this.

Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: doc(), source() and "stored" property

oreno
Hi Vadim,
Any chance you found a solution for this issue? improving source() fetch performance that is.
I Also need to retrieve the objects from the source() method, since that way the objects are returning at
their initial structure and I'm able to iterate and distinguishing between these objects's fields, instead of getting all their combined fields at once.  
The problem is that it has a bad performance as explained above.

Any news?

Thanks in advanced,

Oren
Reply | Threaded
Open this post in threaded view
|

Re: doc(), source() and "stored" property

AlexR
In reply to this post by Vadim Voituk
Hi Vadim,

I am not an elastic expert and do not have a solition for you but maybe a workaround.
What if you store your rules as json encoded string and make not analyzed stored field out of it. You will need to parse it inside of yyour script call but you will not need to load your source

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


Reply | Threaded
Open this post in threaded view
|

Re: doc(), source() and "stored" property

oreno
Hi Alex,
I will try about anything that might improve the performance the source() fetching at this point. Do you have an idea of how this can be configured?

Thanks,

Oren

נשלח מה-iPhone שלי

ב-Jul 6, 2013, בשעה 2:59 PM, "AlexR [via ElasticSearch Users]" <[hidden email]> כתב/ה:

Hi Vadim,

I am not an elastic expert and do not have a solition for you but maybe a workaround.
What if you store your rules as json encoded string and make not analyzed stored field out of it. You will need to parse it inside of yyour script call but you will not need to load your source

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.





If you reply to this email, your message will be added to the discussion below:
http://elasticsearch-users.115913.n3.nabble.com/doc-source-and-stored-property-tp4018466p4037645.html
To unsubscribe from doc(), source() and "stored" property, click here.
NAML
Reply | Threaded
Open this post in threaded view
|

Re: doc(), source() and "stored" property

paolociccarese
This post has NOT been accepted by the mailing list yet.
Hi Vadym,
I am creating a native script and I am experiencing the same scenario (list of objects) and, while I can access the properties fine with source(), I can get it to work with doc().

I was wondering if you found a way to make doc() work.
Anything you can share?

Thank you,
Paolo