has parent score type - BUG?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

has parent score type - BUG?

phill
I was glad to see the "score_type" feature added in 0.20.2 which allows
the parents scoring to be transferred to the child.

http://www.elasticsearch.org/guide/reference/query-dsl/has-parent-query.html

But I think there is a bug when I use it.

I have not yet reduced it to a simple example, but I have done the
following.
* checked all children have a routing value equal to their parents ID.
* checked all parents have a routing value equals to their own ID.
* run a particular query with and without the "score_type" : "score" in
the "has_parent" query.

"score_type":  "none"  (or without the new "score_type" field)
     I get all the documents I expect.
"score_type":  "score"
     SOMETIMES I GET 3 DOCUMENTS SOMETIMES I GET 4 IN THE RESULT.
     I just have to try it in the head plugin and hit submit and value
changes between 3 and 4, but not consistently.
     There are no node failures.  I only have two shards.

Here is the most reduced query I could come up with that is like what
I'd like to do which is find children that have parents.  Where some of
the scoring comes from the parents.
Of course in my real query the criteria for the parents is more complex
than "match_all".

curl 'http://localhost:9200/myindex/MyChildType/_search' - d '
{
   "from": 0,
   "timeout": 4000,
   "query": {
     "filtered": {
       "query": {
         "bool": {
           "should": [
             {
               "has_parent": {
                 "query": {
                   "match_all": {}
                 },
                 "parent_type": "MjDocument",
                 "score_type": "none"    <-- change this to
"score_type": "score" and you get fewer results expected 9, sometimes 3
sometimes 4.
               }
             }
           ],
           "boost": 1000
         }
       },
       "filter": {
         "prefix": {
           "Path.NALocation": "Bugs\\Phrase Boosting\\Subphrase\\"
         }
       }
     }
   }
}'

I know that prefix queries are slow, I just use it here to find exactly
the set of files I was testing with.  The filter works.

Am I doing the "has_parent" wrong or there something else wrong here?

Point 2:
The doc for this new feature needs some work.

"The supported score types are |score| or |none|. The default is |none|
and yields the same behavior as in previous versions. If the score type
is set to another value than |none|, the scores of all the matching
parent documents are aggregated into the associated child documents. "
--
http://www.elasticsearch.org/guide/reference/query-dsl/has-parent-query.html

1. "same .. as in previous"  Now how is a new user (or someone reviewing
existing behavior) supposed to know what the old behavior is or was?
2. " the scores of all the matching parent documents are aggregated into
the associated child documents" It looks like someone copied this from
the same sentence in the "has_child".
I assume no parents score_s_ are aggregated together only that  "... the
score of _each_ matching parent document is aggregated into the score
for each of its child documents."
Document_s_ to Document_s_ doesn't actually provide any useful
documentation about what goes where and could be replaced with "some
scores are moved about" :-)

If someone can confirm 1, 2 and provide a description of expected "old"
behavior (for "none"), I'll even submit a pull request for the changes
to this page.

Also if someone can think of a different or more efficient way to score
filtered children based at least partially on the score of the parents,
please suggest.

-Paul

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


Reply | Threaded
Open this post in threaded view
|

Re: has parent score type - BUG?

Martijn v Groningen
Hi Paul,

Were you able to create a test case that fails more or less consistently? Also have you tried running your has_parent query with the latest 0.20 release or 0.90.x?

Yes, the has_parent docs need to be corrected. The "old behavioir" is that parent scores aren't pushed to the child documents. The score is equal to the boost (defaults to 1) specified in the has_parent query. I'll update the documentation for `has_parent`.

Martijn

On 13 March 2013 22:30, P. Hill <[hidden email]> wrote:
I was glad to see the "score_type" feature added in 0.20.2 which allows the parents scoring to be transferred to the child.

http://www.elasticsearch.org/guide/reference/query-dsl/has-parent-query.html

But I think there is a bug when I use it.

I have not yet reduced it to a simple example, but I have done the following.
* checked all children have a routing value equal to their parents ID.
* checked all parents have a routing value equals to their own ID.
* run a particular query with and without the "score_type" : "score" in the "has_parent" query.

"score_type":  "none"  (or without the new "score_type" field)
    I get all the documents I expect.
"score_type":  "score"
    SOMETIMES I GET 3 DOCUMENTS SOMETIMES I GET 4 IN THE RESULT.
    I just have to try it in the head plugin and hit submit and value changes between 3 and 4, but not consistently.
    There are no node failures.  I only have two shards.

Here is the most reduced query I could come up with that is like what I'd like to do which is find children that have parents.  Where some of the scoring comes from the parents.
Of course in my real query the criteria for the parents is more complex than "match_all".

curl 'http://localhost:9200/myindex/MyChildType/_search' - d '
{
  "from": 0,
  "timeout": 4000,
  "query": {
    "filtered": {
      "query": {
        "bool": {
          "should": [
            {
              "has_parent": {
                "query": {
                  "match_all": {}
                },
                "parent_type": "MjDocument",
                "score_type": "none"    <-- change this to "score_type": "score" and you get fewer results expected 9, sometimes 3 sometimes 4.
              }
            }
          ],
          "boost": 1000
        }
      },
      "filter": {
        "prefix": {
          "Path.NALocation": "Bugs\\Phrase Boosting\\Subphrase\\"
        }
      }
    }
  }
}'

I know that prefix queries are slow, I just use it here to find exactly the set of files I was testing with.  The filter works.

Am I doing the "has_parent" wrong or there something else wrong here?

Point 2:
The doc for this new feature needs some work.

"The supported score types are |score| or |none|. The default is |none| and yields the same behavior as in previous versions. If the score type is set to another value than |none|, the scores of all the matching parent documents are aggregated into the associated child documents. "
--
http://www.elasticsearch.org/guide/reference/query-dsl/has-parent-query.html

1. "same .. as in previous"  Now how is a new user (or someone reviewing existing behavior) supposed to know what the old behavior is or was?
2. " the scores of all the matching parent documents are aggregated into the associated child documents" It looks like someone copied this from the same sentence in the "has_child".
I assume no parents score_s_ are aggregated together only that  "... the score of _each_ matching parent document is aggregated into the score for each of its child documents."
Document_s_ to Document_s_ doesn't actually provide any useful documentation about what goes where and could be replaced with "some scores are moved about" :-)

If someone can confirm 1, 2 and provide a description of expected "old" behavior (for "none"), I'll even submit a pull request for the changes to this page.

Also if someone can think of a different or more efficient way to score filtered children based at least partially on the score of the parents, please suggest.

-Paul

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.





--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: has parent score type - BUG?

Victor Zeng
This post has NOT been accepted by the mailing list yet.
This post was updated on .
In reply to this post by phill
Hey Phill,

is been quite a few months, I was wondering if you ever found a solution to this issue.

I have recently ran into a very similar issue. Where my routing for the child doc is required (aka all child have parents). And when I perform a has parent query on the child doc with match all, I get a very different total for score_type 'none' vs 'score'.

I was able to fix this discrepancy by reloading my parent doctype and documents with the exactly same configuration / data

I was unable to fix this with refresh/flush/optimize

Most importantly not able to reproduce this. It seems to happen over time in a very mysterious way

Version # 0.90.5