Children aggregation (1.4.0.Beta1) Round-Robin result

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Children aggregation (1.4.0.Beta1) Round-Robin result

Vlad Vlaskin
Dear ES group,
we've been using ES in production for a while and test eagerly all new-coming features such as cardinality and others.

We try data modeling with parent-child relations (ES version 1.4.0.Beta1, 8 nodes, EC2 r3.xlarge, ssd, lot ram etc.)
With data model of: 
Parent
{
  "key": "value"  
}

and a timeline with children, holding metrics:

Child (type "metrics")
{
 "day": "2014-10-20",
  "count: 10
}

We update metric documents and properly index them with script+upsert.
The problem is that the query below yields in 2 different results in round robin way. 
E.g. first time you call it you receive the first number, a second after you receive the second and again back to the first, etc. 

{
    "size": 0,
    "query": {
        "match_all": {}
    },
    "aggs": {
        "MY_FIELD": {
            "terms": {
                "field": "FIELD-XYZ"             // parent term aggregation 
            },
            "aggs": {
                "children": {
                    "children": {
                        "type": "metrics"        // child aggregation of type "metrics"
                    },
                    "aggs": {
                        "requests": {
                            "sum": {
                                "field": "count" // target aggregation within child documents
                            } 
                        }
                    }
                }
            }
        }
    }
}

 Result A: 
"aggregations": {
      "MY_FIELD": {
         "doc_count_error_upper_bound": 0,
         "buckets": [
            {
               "key": "xx",
               "doc_count": 283322,
               "children": {
                  "doc_count": 3740372,
                  "requests": {
                     "value": 5801652297
                  }
               }
            }
         ]
      }
   }

Result B:
"aggregations": {
      "MY_FIELD": {
         "doc_count_error_upper_bound": 0,
         "buckets": [
            {
               "key": "xx",
               "doc_count": 302421,
               "children": {
                  "doc_count": 1877361,
                  "requests": {
                     "value": 2965346170
                  }
               }
            }
         ]
      }
   }

The problem is that switching A to B back and forth is pretty stable and reproducible. 
ES logs are clear. 

Could someone help towards some ideas here?

Thank you!

Vlad

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6c948f61-0dce-4a62-b6ce-22b6a83aeaca%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Children aggregation (1.4.0.Beta1) Round-Robin result

Martijn v Groningen
Hi Vlad,

I see that the doc_count is also different between the requests. Is the actual bucket key also different between A and B?

Martijn 

On 21 October 2014 03:18, Vlad Vlaskin <[hidden email]> wrote:
Dear ES group,
we've been using ES in production for a while and test eagerly all new-coming features such as cardinality and others.

We try data modeling with parent-child relations (ES version 1.4.0.Beta1, 8 nodes, EC2 r3.xlarge, ssd, lot ram etc.)
With data model of: 
Parent
{
  "key": "value"  
}

and a timeline with children, holding metrics:

Child (type "metrics")
{
 "day": "2014-10-20",
  "count: 10
}

We update metric documents and properly index them with script+upsert.
The problem is that the query below yields in 2 different results in round robin way. 
E.g. first time you call it you receive the first number, a second after you receive the second and again back to the first, etc. 

{
    "size": 0,
    "query": {
        "match_all": {}
    },
    "aggs": {
        "MY_FIELD": {
            "terms": {
                "field": "FIELD-XYZ"             // parent term aggregation 
            },
            "aggs": {
                "children": {
                    "children": {
                        "type": "metrics"        // child aggregation of type "metrics"
                    },
                    "aggs": {
                        "requests": {
                            "sum": {
                                "field": "count" // target aggregation within child documents
                            } 
                        }
                    }
                }
            }
        }
    }
}

 Result A: 
"aggregations": {
      "MY_FIELD": {
         "doc_count_error_upper_bound": 0,
         "buckets": [
            {
               "key": "xx",
               "doc_count": 283322,
               "children": {
                  "doc_count": 3740372,
                  "requests": {
                     "value": 5801652297
                  }
               }
            }
         ]
      }
   }

Result B:
"aggregations": {
      "MY_FIELD": {
         "doc_count_error_upper_bound": 0,
         "buckets": [
            {
               "key": "xx",
               "doc_count": 302421,
               "children": {
                  "doc_count": 1877361,
                  "requests": {
                     "value": 2965346170
                  }
               }
            }
         ]
      }
   }

The problem is that switching A to B back and forth is pretty stable and reproducible. 
ES logs are clear. 

Could someone help towards some ideas here?

Thank you!

Vlad

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6c948f61-0dce-4a62-b6ce-22b6a83aeaca%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BA76Tz5KJWKU0TFodaaT82tkw_9tiu2cb7FttvxcXMO%3DT57yQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Children aggregation (1.4.0.Beta1) Round-Robin result

Vlad Vlaskin
Hi Martin,

The bucket key for parent-term aggregation is the same. 


Maybe to make explanation simper, today I tried 2 queries:

Query A: Sum of the field "count" in child documents directly.

GET: INDEX-NAME/child/_search

{
    "size": 0,
    "query": {
        "match_all": {}
    },
    "aggs": {
        "requests": {
            "sum": {
                "field": "count"
            }
        }
    }
}


Query B: Sum of the field "count" through parent documents.
GET: INDEX-NAME/_search  ( we query all doc types here)

{
    "size": 0,
    "query": {
        "match_all": {}
    },
    "aggs": {
        "child": {
            "children": {
                "type": "child" 
            },
            "aggs": {
                "requests": {
                    "sum": {
                        "field": "count"
                    }
                }
            }
        }
    }
}

I expect these numbers to be about the same, but they are x times differs from each other:

Result from query A: 

 "hits": {
      "total": 4614829,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "requests": {
         "value": 53364274 // numbers make sense
      }
   }


Result  from query B:

"hits": {
      "total": 4908110,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "child": {
         "doc_count": 13267677,
         "requests": {
            "value": 11208150231   // numbers does not make any sense
         }
      }
   }

I just want to understand whether it is the feature problem (parent-child aggregation) or something wrong with data modeling.


Thank you  

On Tuesday, October 21, 2014 10:03:07 AM UTC+2, Martijn v Groningen wrote:
Hi Vlad,

I see that the doc_count is also different between the requests. Is the actual bucket key also different between A and B?

Martijn 

On 21 October 2014 03:18, Vlad Vlaskin <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="7dSIaK9q7r8J" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">vl...@...> wrote:
Dear ES group,
we've been using ES in production for a while and test eagerly all new-coming features such as cardinality and others.

We try data modeling with parent-child relations (ES version 1.4.0.Beta1, 8 nodes, EC2 r3.xlarge, ssd, lot ram etc.)
With data model of: 
Parent
{
  "key": "value"  
}

and a timeline with children, holding metrics:

Child (type "metrics")
{
 "day": "2014-10-20",
  "count: 10
}

We update metric documents and properly index them with script+upsert.
The problem is that the query below yields in 2 different results in round robin way. 
E.g. first time you call it you receive the first number, a second after you receive the second and again back to the first, etc. 

{
    "size": 0,
    "query": {
        "match_all": {}
    },
    "aggs": {
        "MY_FIELD": {
            "terms": {
                "field": "FIELD-XYZ"             // parent term aggregation 
            },
            "aggs": {
                "children": {
                    "children": {
                        "type": "metrics"        // child aggregation of type "metrics"
                    },
                    "aggs": {
                        "requests": {
                            "sum": {
                                "field": "count" // target aggregation within child documents
                            } 
                        }
                    }
                }
            }
        }
    }
}

 Result A: 
"aggregations": {
      "MY_FIELD": {
         "doc_count_error_upper_bound": 0,
         "buckets": [
            {
               "key": "xx",
               "doc_count": 283322,
               "children": {
                  "doc_count": 3740372,
                  "requests": {
                     "value": 5801652297
                  }
               }
            }
         ]
      }
   }

Result B:
"aggregations": {
      "MY_FIELD": {
         "doc_count_error_upper_bound": 0,
         "buckets": [
            {
               "key": "xx",
               "doc_count": 302421,
               "children": {
                  "doc_count": 1877361,
                  "requests": {
                     "value": 2965346170
                  }
               }
            }
         ]
      }
   }

The problem is that switching A to B back and forth is pretty stable and reproducible. 
ES logs are clear. 

Could someone help towards some ideas here?

Thank you!

Vlad

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="7dSIaK9q7r8J" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/6c948f61-0dce-4a62-b6ce-22b6a83aeaca%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/6c948f61-0dce-4a62-b6ce-22b6a83aeaca%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/6c948f61-0dce-4a62-b6ce-22b6a83aeaca%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/6c948f61-0dce-4a62-b6ce-22b6a83aeaca%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.



--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2280502d-1c92-4e2b-81f1-aa87c41d81ca%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Children aggregation (1.4.0.Beta1) Round-Robin result

Vlad Vlaskin
In reply to this post by Vlad Vlaskin
After some experiments I believe I found the cause of the discrepancy problem:

ElasticSearch does not detach child object after it has been updated from parent child aggregation and uses it in child aggregation. 

E.g. I have my child updated 4 times with script (within batch update), and it has 4 versions:
{ "count": 1}, { "count": 2}, { "count": 3}, { "count": 4}

Query to the child document (after refresh) shows you proper version: {"count": 4}

But child aggregation {"sum":{"field":"count"}} shows you 10, because:

1 + 2 +3 +4 = 10

It works pretty accurate (e.g. for 5 you have 15). 

It explains the behavior here.





On Tuesday, October 21, 2014 3:18:47 AM UTC+2, Vlad Vlaskin wrote:
Dear ES group,
we've been using ES in production for a while and test eagerly all new-coming features such as cardinality and others.

We try data modeling with parent-child relations (ES version 1.4.0.Beta1, 8 nodes, EC2 r3.xlarge, ssd, lot ram etc.)
With data model of: 
Parent
{
  "key": "value"  
}

and a timeline with children, holding metrics:

Child (type "metrics")
{
 "day": "2014-10-20",
  "count: 10
}

We update metric documents and properly index them with script+upsert.
The problem is that the query below yields in 2 different results in round robin way. 
E.g. first time you call it you receive the first number, a second after you receive the second and again back to the first, etc. 

{
    "size": 0,
    "query": {
        "match_all": {}
    },
    "aggs": {
        "MY_FIELD": {
            "terms": {
                "field": "FIELD-XYZ"             // parent term aggregation 
            },
            "aggs": {
                "children": {
                    "children": {
                        "type": "metrics"        // child aggregation of type "metrics"
                    },
                    "aggs": {
                        "requests": {
                            "sum": {
                                "field": "count" // target aggregation within child documents
                            } 
                        }
                    }
                }
            }
        }
    }
}

 Result A: 
"aggregations": {
      "MY_FIELD": {
         "doc_count_error_upper_bound": 0,
         "buckets": [
            {
               "key": "xx",
               "doc_count": 283322,
               "children": {
                  "doc_count": 3740372,
                  "requests": {
                     "value": 5801652297
                  }
               }
            }
         ]
      }
   }

Result B:
"aggregations": {
      "MY_FIELD": {
         "doc_count_error_upper_bound": 0,
         "buckets": [
            {
               "key": "xx",
               "doc_count": 302421,
               "children": {
                  "doc_count": 1877361,
                  "requests": {
                     "value": 2965346170
                  }
               }
            }
         ]
      }
   }

The problem is that switching A to B back and forth is pretty stable and reproducible. 
ES logs are clear. 

Could someone help towards some ideas here?

Thank you!

Vlad

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2ce80724-b3d9-4d58-b54e-15727f999564%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Children aggregation (1.4.0.Beta1) Round-Robin result

Martijn v Groningen
Hi Vlad,

What you're describing shouldn't happen. The child docs should get detached. I think this is a bug.
Let me verify and get back to you.

Martijn

On 21 October 2014 13:26, Vlad Vlaskin <[hidden email]> wrote:
After some experiments I believe I found the cause of the discrepancy problem:

ElasticSearch does not detach child object after it has been updated from parent child aggregation and uses it in child aggregation. 

E.g. I have my child updated 4 times with script (within batch update), and it has 4 versions:
{ "count": 1}, { "count": 2}, { "count": 3}, { "count": 4}

Query to the child document (after refresh) shows you proper version: {"count": 4}

But child aggregation {"sum":{"field":"count"}} shows you 10, because:

1 + 2 +3 +4 = 10

It works pretty accurate (e.g. for 5 you have 15). 

It explains the behavior here.





On Tuesday, October 21, 2014 3:18:47 AM UTC+2, Vlad Vlaskin wrote:
Dear ES group,
we've been using ES in production for a while and test eagerly all new-coming features such as cardinality and others.

We try data modeling with parent-child relations (ES version 1.4.0.Beta1, 8 nodes, EC2 r3.xlarge, ssd, lot ram etc.)
With data model of: 
Parent
{
  "key": "value"  
}

and a timeline with children, holding metrics:

Child (type "metrics")
{
 "day": "2014-10-20",
  "count: 10
}

We update metric documents and properly index them with script+upsert.
The problem is that the query below yields in 2 different results in round robin way. 
E.g. first time you call it you receive the first number, a second after you receive the second and again back to the first, etc. 

{
    "size": 0,
    "query": {
        "match_all": {}
    },
    "aggs": {
        "MY_FIELD": {
            "terms": {
                "field": "FIELD-XYZ"             // parent term aggregation 
            },
            "aggs": {
                "children": {
                    "children": {
                        "type": "metrics"        // child aggregation of type "metrics"
                    },
                    "aggs": {
                        "requests": {
                            "sum": {
                                "field": "count" // target aggregation within child documents
                            } 
                        }
                    }
                }
            }
        }
    }
}

 Result A: 
"aggregations": {
      "MY_FIELD": {
         "doc_count_error_upper_bound": 0,
         "buckets": [
            {
               "key": "xx",
               "doc_count": 283322,
               "children": {
                  "doc_count": 3740372,
                  "requests": {
                     "value": 5801652297
                  }
               }
            }
         ]
      }
   }

Result B:
"aggregations": {
      "MY_FIELD": {
         "doc_count_error_upper_bound": 0,
         "buckets": [
            {
               "key": "xx",
               "doc_count": 302421,
               "children": {
                  "doc_count": 1877361,
                  "requests": {
                     "value": 2965346170
                  }
               }
            }
         ]
      }
   }

The problem is that switching A to B back and forth is pretty stable and reproducible. 
ES logs are clear. 

Could someone help towards some ideas here?

Thank you!

Vlad

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2ce80724-b3d9-4d58-b54e-15727f999564%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BA76TzTx2K0r-Mt4rscy8NELqPz67RkZ6GyvajME7qZYAQuUw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Children aggregation (1.4.0.Beta1) Round-Robin result

Vlad Vlaskin
Hi Martijn,

Couple hours age I tried to submit a bug on ES Github issues and during creating steps of reproduce realized one more thing.

It happens only if you update the same child document within one bulk request.

Because I didn't manage to reproduce the "arithmetic progression" effect with curling my localhost, but it is still reproducible from java code doing bulk-update (script + upsert doc). 
I understand that bulk-updating the same document is a pretty ugly thing 
and I was surprised when it worked normally (without exceptions about version conflicts) from java client. 

If it might be helpful: these are the steps and queries to curl your localhost with parent-child.
Unfortunately I don't know how to create a curl with bulk updates. 


     #Create index "test" with parent-cild mappings
curl -XPUT localhost:9200/test -d '{"mappings":{"root":{"properties":{"country":{"type":"string"}}},"metric":{"_parent":{"type":"root"},"properties":{"count":{"type":"long"}}}}}'
 
#Index parent document:
curl -XPUT localhost:9200/test/root/1 -d '{"country":"de"}'

#Index child document:
curl -XPUT 'http://localhost:9200/test/metric/1?parent=1' -d '{"count":1}'
#Update child document:
curl -XPOST 'http://localhost:9200/test/metric/1/_update?parent=1' -d '{"script":"ctx._source.count+=ct", "params":{"ct":1}}'
#Query with benchmark query, it should return 2
curl -XGET localhost:9200/test/_search -d '{"size":0,"query":{"match_all":{}},"aggs":{"requests":{"sum":{"field":"count"}}}}'
#Query with child aggregation query, exepected 2
curl -XGET localhost:9200/test/metric/_search -d '{"size":0,"query":{"match_all":{}},"aggs":{"child":{"children":{"type":"metric"},"aggs":{"requests":{"sum":{"field":"count"}}}}}}'


Thank you

On Tuesday, October 21, 2014 3:33:35 PM UTC+2, Martijn v Groningen wrote:
Hi Vlad,

What you're describing shouldn't happen. The child docs should get detached. I think this is a bug.
Let me verify and get back to you.

Martijn

On 21 October 2014 13:26, Vlad Vlaskin <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="bTvrgKnuFDsJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">vl...@...> wrote:
After some experiments I believe I found the cause of the discrepancy problem:

ElasticSearch does not detach child object after it has been updated from parent child aggregation and uses it in child aggregation. 

E.g. I have my child updated 4 times with script (within batch update), and it has 4 versions:
{ "count": 1}, { "count": 2}, { "count": 3}, { "count": 4}

Query to the child document (after refresh) shows you proper version: {"count": 4}

But child aggregation {"sum":{"field":"count"}} shows you 10, because:

1 + 2 +3 +4 = 10

It works pretty accurate (e.g. for 5 you have 15). 

It explains the behavior here.





On Tuesday, October 21, 2014 3:18:47 AM UTC+2, Vlad Vlaskin wrote:
Dear ES group,
we've been using ES in production for a while and test eagerly all new-coming features such as cardinality and others.

We try data modeling with parent-child relations (ES version 1.4.0.Beta1, 8 nodes, EC2 r3.xlarge, ssd, lot ram etc.)
With data model of: 
Parent
{
  "key": "value"  
}

and a timeline with children, holding metrics:

Child (type "metrics")
{
 "day": "2014-10-20",
  "count: 10
}

We update metric documents and properly index them with script+upsert.
The problem is that the query below yields in 2 different results in round robin way. 
E.g. first time you call it you receive the first number, a second after you receive the second and again back to the first, etc. 

{
    "size": 0,
    "query": {
        "match_all": {}
    },
    "aggs": {
        "MY_FIELD": {
            "terms": {
                "field": "FIELD-XYZ"             // parent term aggregation 
            },
            "aggs": {
                "children": {
                    "children": {
                        "type": "metrics"        // child aggregation of type "metrics"
                    },
                    "aggs": {
                        "requests": {
                            "sum": {
                                "field": "count" // target aggregation within child documents
                            } 
                        }
                    }
                }
            }
        }
    }
}

 Result A: 
"aggregations": {
      "MY_FIELD": {
         "doc_count_error_upper_bound": 0,
         "buckets": [
            {
               "key": "xx",
               "doc_count": 283322,
               "children": {
                  "doc_count": 3740372,
                  "requests": {
                     "value": 5801652297
                  }
               }
            }
         ]
      }
   }

Result B:
"aggregations": {
      "MY_FIELD": {
         "doc_count_error_upper_bound": 0,
         "buckets": [
            {
               "key": "xx",
               "doc_count": 302421,
               "children": {
                  "doc_count": 1877361,
                  "requests": {
                     "value": 2965346170
                  }
               }
            }
         ]
      }
   }

The problem is that switching A to B back and forth is pretty stable and reproducible. 
ES logs are clear. 

Could someone help towards some ideas here?

Thank you!

Vlad

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="bTvrgKnuFDsJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/2ce80724-b3d9-4d58-b54e-15727f999564%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/2ce80724-b3d9-4d58-b54e-15727f999564%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/2ce80724-b3d9-4d58-b54e-15727f999564%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/2ce80724-b3d9-4d58-b54e-15727f999564%40googlegroups.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.



--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ab610c8d-f85c-4967-aff1-7e79111fe71d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Children aggregation (1.4.0.Beta1) Round-Robin result

Martijn v Groningen
Hi Vlad, 

I reproduced it. The children agg doesn't take documents marked as deleted into account properly.

When documents are deleted they are initially marked as deleted before they're removed from the index. This also applies to updates, because that translate into an index + delete. 

The issue you're experiencing can also happen when not using the bulk api. It may just be a bit less likely to manifest.

The fix for this bug is small. I'll open a PR soon.

Martijn

On 21 October 2014 15:51, Vlad Vlaskin <[hidden email]> wrote:
Hi Martijn,

Couple hours age I tried to submit a bug on ES Github issues and during creating steps of reproduce realized one more thing.

It happens only if you update the same child document within one bulk request.

Because I didn't manage to reproduce the "arithmetic progression" effect with curling my localhost, but it is still reproducible from java code doing bulk-update (script + upsert doc). 
I understand that bulk-updating the same document is a pretty ugly thing 
and I was surprised when it worked normally (without exceptions about version conflicts) from java client. 

If it might be helpful: these are the steps and queries to curl your localhost with parent-child.
Unfortunately I don't know how to create a curl with bulk updates. 


     #Create index "test" with parent-cild mappings
curl -XPUT localhost:9200/test -d '{"mappings":{"root":{"properties":{"country":{"type":"string"}}},"metric":{"_parent":{"type":"root"},"properties":{"count":{"type":"long"}}}}}'
 
#Index parent document:
curl -XPUT localhost:9200/test/root/1 -d '{"country":"de"}'

#Index child document:
curl -XPUT 'http://localhost:9200/test/metric/1?parent=1' -d '{"count":1}'
#Update child document:
curl -XPOST 'http://localhost:9200/test/metric/1/_update?parent=1' -d '{"script":"ctx._source.count+=ct", "params":{"ct":1}}'
#Query with benchmark query, it should return 2
curl -XGET localhost:9200/test/_search -d '{"size":0,"query":{"match_all":{}},"aggs":{"requests":{"sum":{"field":"count"}}}}'
#Query with child aggregation query, exepected 2
curl -XGET localhost:9200/test/metric/_search -d '{"size":0,"query":{"match_all":{}},"aggs":{"child":{"children":{"type":"metric"},"aggs":{"requests":{"sum":{"field":"count"}}}}}}'


Thank you

On Tuesday, October 21, 2014 3:33:35 PM UTC+2, Martijn v Groningen wrote:
Hi Vlad,

What you're describing shouldn't happen. The child docs should get detached. I think this is a bug.
Let me verify and get back to you.

Martijn

On 21 October 2014 13:26, Vlad Vlaskin <[hidden email]> wrote:
After some experiments I believe I found the cause of the discrepancy problem:

ElasticSearch does not detach child object after it has been updated from parent child aggregation and uses it in child aggregation. 

E.g. I have my child updated 4 times with script (within batch update), and it has 4 versions:
{ "count": 1}, { "count": 2}, { "count": 3}, { "count": 4}

Query to the child document (after refresh) shows you proper version: {"count": 4}

But child aggregation {"sum":{"field":"count"}} shows you 10, because:

1 + 2 +3 +4 = 10

It works pretty accurate (e.g. for 5 you have 15). 

It explains the behavior here.





On Tuesday, October 21, 2014 3:18:47 AM UTC+2, Vlad Vlaskin wrote:
Dear ES group,
we've been using ES in production for a while and test eagerly all new-coming features such as cardinality and others.

We try data modeling with parent-child relations (ES version 1.4.0.Beta1, 8 nodes, EC2 r3.xlarge, ssd, lot ram etc.)
With data model of: 
Parent
{
  "key": "value"  
}

and a timeline with children, holding metrics:

Child (type "metrics")
{
 "day": "2014-10-20",
  "count: 10
}

We update metric documents and properly index them with script+upsert.
The problem is that the query below yields in 2 different results in round robin way. 
E.g. first time you call it you receive the first number, a second after you receive the second and again back to the first, etc. 

{
    "size": 0,
    "query": {
        "match_all": {}
    },
    "aggs": {
        "MY_FIELD": {
            "terms": {
                "field": "FIELD-XYZ"             // parent term aggregation 
            },
            "aggs": {
                "children": {
                    "children": {
                        "type": "metrics"        // child aggregation of type "metrics"
                    },
                    "aggs": {
                        "requests": {
                            "sum": {
                                "field": "count" // target aggregation within child documents
                            } 
                        }
                    }
                }
            }
        }
    }
}

 Result A: 
"aggregations": {
      "MY_FIELD": {
         "doc_count_error_upper_bound": 0,
         "buckets": [
            {
               "key": "xx",
               "doc_count": 283322,
               "children": {
                  "doc_count": 3740372,
                  "requests": {
                     "value": 5801652297
                  }
               }
            }
         ]
      }
   }

Result B:
"aggregations": {
      "MY_FIELD": {
         "doc_count_error_upper_bound": 0,
         "buckets": [
            {
               "key": "xx",
               "doc_count": 302421,
               "children": {
                  "doc_count": 1877361,
                  "requests": {
                     "value": 2965346170
                  }
               }
            }
         ]
      }
   }

The problem is that switching A to B back and forth is pretty stable and reproducible. 
ES logs are clear. 

Could someone help towards some ideas here?

Thank you!

Vlad

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2ce80724-b3d9-4d58-b54e-15727f999564%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ab610c8d-f85c-4967-aff1-7e79111fe71d%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BA76Tx5jaTgUjZuXU%3D%2BZfjQ%2Br-Bxi5MOq%2BayOUbT5jWfa8trA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Children aggregation (1.4.0.Beta1) Round-Robin result

Vlad Vlaskin
Hi Martijn,

great news, thank you!

Would you recommend to keep parent-child data model and wait for a release?  (Do you have a feeling of the date?).

Thank you

Vlad



On Tuesday, October 21, 2014 4:01:47 PM UTC+2, Martijn v Groningen wrote:
Hi Vlad, 

I reproduced it. The children agg doesn't take documents marked as deleted into account properly.

When documents are deleted they are initially marked as deleted before they're removed from the index. This also applies to updates, because that translate into an index + delete. 

The issue you're experiencing can also happen when not using the bulk api. It may just be a bit less likely to manifest.

The fix for this bug is small. I'll open a PR soon.

Martijn

On 21 October 2014 15:51, Vlad Vlaskin <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="zOoG2drIrTwJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">vl...@...> wrote:
Hi Martijn,

Couple hours age I tried to submit a bug on ES Github issues and during creating steps of reproduce realized one more thing.

It happens only if you update the same child document within one bulk request.

Because I didn't manage to reproduce the "arithmetic progression" effect with curling my localhost, but it is still reproducible from java code doing bulk-update (script + upsert doc). 
I understand that bulk-updating the same document is a pretty ugly thing 
and I was surprised when it worked normally (without exceptions about version conflicts) from java client. 

If it might be helpful: these are the steps and queries to curl your localhost with parent-child.
Unfortunately I don't know how to create a curl with bulk updates. 


     #Create index "test" with parent-cild mappings
curl -XPUT localhost:9200/test -d '{"mappings":{"root":{"properties":{"country":{"type":"string"}}},"metric":{"_parent":{"type":"root"},"properties":{"count":{"type":"long"}}}}}'
 
#Index parent document:
curl -XPUT localhost:9200/test/root/1 -d '{"country":"de"}'

#Index child document:
curl -XPUT '<a href="http://localhost:9200/test/metric/1?parent=1" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Flocalhost%3A9200%2Ftest%2Fmetric%2F1%3Fparent%3D1\46sa\75D\46sntz\0751\46usg\75AFQjCNEYh6ijDPErb1C2foGXG0qLjt58Ug';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Flocalhost%3A9200%2Ftest%2Fmetric%2F1%3Fparent%3D1\46sa\75D\46sntz\0751\46usg\75AFQjCNEYh6ijDPErb1C2foGXG0qLjt58Ug';return true;">http://localhost:9200/test/metric/1?parent=1' -d '{"count":1}'
#Update child document:
curl -XPOST '<a href="http://localhost:9200/test/metric/1/_update?parent=1" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Flocalhost%3A9200%2Ftest%2Fmetric%2F1%2F_update%3Fparent%3D1\46sa\75D\46sntz\0751\46usg\75AFQjCNEPf_hTk3XL0nnrk3-ALrm0v-Rvlw';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Flocalhost%3A9200%2Ftest%2Fmetric%2F1%2F_update%3Fparent%3D1\46sa\75D\46sntz\0751\46usg\75AFQjCNEPf_hTk3XL0nnrk3-ALrm0v-Rvlw';return true;">http://localhost:9200/test/metric/1/_update?parent=1' -d '{"script":"ctx._source.count+=ct", "params":{"ct":1}}'
#Query with benchmark query, it should return 2
curl -XGET localhost:9200/test/_search -d '{"size":0,"query":{"match_all":{}},"aggs":{"requests":{"sum":{"field":"count"}}}}'
#Query with child aggregation query, exepected 2
curl -XGET localhost:9200/test/metric/_search -d '{"size":0,"query":{"match_all":{}},"aggs":{"child":{"children":{"type":"metric"},"aggs":{"requests":{"sum":{"field":"count"}}}}}}'


Thank you

On Tuesday, October 21, 2014 3:33:35 PM UTC+2, Martijn v Groningen wrote:
Hi Vlad,

What you're describing shouldn't happen. The child docs should get detached. I think this is a bug.
Let me verify and get back to you.

Martijn

On 21 October 2014 13:26, Vlad Vlaskin <[hidden email]> wrote:
After some experiments I believe I found the cause of the discrepancy problem:

ElasticSearch does not detach child object after it has been updated from parent child aggregation and uses it in child aggregation. 

E.g. I have my child updated 4 times with script (within batch update), and it has 4 versions:
{ "count": 1}, { "count": 2}, { "count": 3}, { "count": 4}

Query to the child document (after refresh) shows you proper version: {"count": 4}

But child aggregation {"sum":{"field":"count"}} shows you 10, because:

1 + 2 +3 +4 = 10

It works pretty accurate (e.g. for 5 you have 15). 

It explains the behavior here.





On Tuesday, October 21, 2014 3:18:47 AM UTC+2, Vlad Vlaskin wrote:
Dear ES group,
we've been using ES in production for a while and test eagerly all new-coming features such as cardinality and others.

We try data modeling with parent-child relations (ES version 1.4.0.Beta1, 8 nodes, EC2 r3.xlarge, ssd, lot ram etc.)
With data model of: 
Parent
{
  "key": "value"  
}

and a timeline with children, holding metrics:

Child (type "metrics")
{
 "day": "2014-10-20",
  "count: 10
}

We update metric documents and properly index them with script+upsert.
The problem is that the query below yields in 2 different results in round robin way. 
E.g. first time you call it you receive the first number, a second after you receive the second and again back to the first, etc. 

{
    "size": 0,
    "query": {
        "match_all": {}
    },
    "aggs": {
        "MY_FIELD": {
            "terms": {
                "field": "FIELD-XYZ"             // parent term aggregation 
            },
            "aggs": {
                "children": {
                    "children": {
                        "type": "metrics"        // child aggregation of type "metrics"
                    },
                    "aggs": {
                        "requests": {
                            "sum": {
                                "field": "count" // target aggregation within child documents
                            } 
                        }
                    }
                }
            }
        }
    }
}

 Result A: 
"aggregations": {
      "MY_FIELD": {
         "doc_count_error_upper_bound": 0,
         "buckets": [
            {
               "key": "xx",
               "doc_count": 283322,
               "children": {
                  "doc_count": 3740372,
                  "requests": {
                     "value": 5801652297
                  }
               }
            }
         ]
      }
   }

Result B:
"aggregations": {
      "MY_FIELD": {
         "doc_count_error_upper_bound": 0,
         "buckets": [
            {
               "key": "xx",
               "doc_count": 302421,
               "children": {
                  "doc_count": 1877361,
                  "requests": {
                     "value": 2965346170
                  }
               }
            }
         ]
      }
   }

The problem is that switching A to B back and forth is pretty stable and reproducible. 
ES logs are clear. 

Could someone help towards some ideas here?

Thank you!

Vlad

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/2ce80724-b3d9-4d58-b54e-15727f999564%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/2ce80724-b3d9-4d58-b54e-15727f999564%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/2ce80724-b3d9-4d58-b54e-15727f999564%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/2ce80724-b3d9-4d58-b54e-15727f999564%40googlegroups.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.



--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="zOoG2drIrTwJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/ab610c8d-f85c-4967-aff1-7e79111fe71d%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/ab610c8d-f85c-4967-aff1-7e79111fe71d%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/ab610c8d-f85c-4967-aff1-7e79111fe71d%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/ab610c8d-f85c-4967-aff1-7e79111fe71d%40googlegroups.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.



--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/235d630d-5f0d-4c12-9f34-02e0f069497d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Children aggregation (1.4.0.Beta1) Round-Robin result

Martijn v Groningen
Hi Vlad,


Many thanks for reporting this issue!
Besides this bug the parent/child model works well, so I recommend to keep it. I don't know exactly when the next 1.4 release is released, but I expect within a week or 2.

Martijn 


On 21 October 2014 16:17, Vlad Vlaskin <[hidden email]> wrote:
Hi Martijn,

great news, thank you!

Would you recommend to keep parent-child data model and wait for a release?  (Do you have a feeling of the date?).

Thank you

Vlad



On Tuesday, October 21, 2014 4:01:47 PM UTC+2, Martijn v Groningen wrote:
Hi Vlad, 

I reproduced it. The children agg doesn't take documents marked as deleted into account properly.

When documents are deleted they are initially marked as deleted before they're removed from the index. This also applies to updates, because that translate into an index + delete. 

The issue you're experiencing can also happen when not using the bulk api. It may just be a bit less likely to manifest.

The fix for this bug is small. I'll open a PR soon.

Martijn

On 21 October 2014 15:51, Vlad Vlaskin <[hidden email]> wrote:
Hi Martijn,

Couple hours age I tried to submit a bug on ES Github issues and during creating steps of reproduce realized one more thing.

It happens only if you update the same child document within one bulk request.

Because I didn't manage to reproduce the "arithmetic progression" effect with curling my localhost, but it is still reproducible from java code doing bulk-update (script + upsert doc). 
I understand that bulk-updating the same document is a pretty ugly thing 
and I was surprised when it worked normally (without exceptions about version conflicts) from java client. 

If it might be helpful: these are the steps and queries to curl your localhost with parent-child.
Unfortunately I don't know how to create a curl with bulk updates. 


     #Create index "test" with parent-cild mappings
curl -XPUT localhost:9200/test -d '{"mappings":{"root":{"properties":{"country":{"type":"string"}}},"metric":{"_parent":{"type":"root"},"properties":{"count":{"type":"long"}}}}}'
 
#Index parent document:
curl -XPUT localhost:9200/test/root/1 -d '{"country":"de"}'

#Index child document:
curl -XPUT 'http://localhost:9200/test/metric/1?parent=1' -d '{"count":1}'
#Update child document:
curl -XPOST 'http://localhost:9200/test/metric/1/_update?parent=1' -d '{"script":"ctx._source.count+=ct", "params":{"ct":1}}'
#Query with benchmark query, it should return 2
curl -XGET localhost:9200/test/_search -d '{"size":0,"query":{"match_all":{}},"aggs":{"requests":{"sum":{"field":"count"}}}}'
#Query with child aggregation query, exepected 2
curl -XGET localhost:9200/test/metric/_search -d '{"size":0,"query":{"match_all":{}},"aggs":{"child":{"children":{"type":"metric"},"aggs":{"requests":{"sum":{"field":"count"}}}}}}'


Thank you

On Tuesday, October 21, 2014 3:33:35 PM UTC+2, Martijn v Groningen wrote:
Hi Vlad,

What you're describing shouldn't happen. The child docs should get detached. I think this is a bug.
Let me verify and get back to you.

Martijn

On 21 October 2014 13:26, Vlad Vlaskin <[hidden email]> wrote:
After some experiments I believe I found the cause of the discrepancy problem:

ElasticSearch does not detach child object after it has been updated from parent child aggregation and uses it in child aggregation. 

E.g. I have my child updated 4 times with script (within batch update), and it has 4 versions:
{ "count": 1}, { "count": 2}, { "count": 3}, { "count": 4}

Query to the child document (after refresh) shows you proper version: {"count": 4}

But child aggregation {"sum":{"field":"count"}} shows you 10, because:

1 + 2 +3 +4 = 10

It works pretty accurate (e.g. for 5 you have 15). 

It explains the behavior here.





On Tuesday, October 21, 2014 3:18:47 AM UTC+2, Vlad Vlaskin wrote:
Dear ES group,
we've been using ES in production for a while and test eagerly all new-coming features such as cardinality and others.

We try data modeling with parent-child relations (ES version 1.4.0.Beta1, 8 nodes, EC2 r3.xlarge, ssd, lot ram etc.)
With data model of: 
Parent
{
  "key": "value"  
}

and a timeline with children, holding metrics:

Child (type "metrics")
{
 "day": "2014-10-20",
  "count: 10
}

We update metric documents and properly index them with script+upsert.
The problem is that the query below yields in 2 different results in round robin way. 
E.g. first time you call it you receive the first number, a second after you receive the second and again back to the first, etc. 

{
    "size": 0,
    "query": {
        "match_all": {}
    },
    "aggs": {
        "MY_FIELD": {
            "terms": {
                "field": "FIELD-XYZ"             // parent term aggregation 
            },
            "aggs": {
                "children": {
                    "children": {
                        "type": "metrics"        // child aggregation of type "metrics"
                    },
                    "aggs": {
                        "requests": {
                            "sum": {
                                "field": "count" // target aggregation within child documents
                            } 
                        }
                    }
                }
            }
        }
    }
}

 Result A: 
"aggregations": {
      "MY_FIELD": {
         "doc_count_error_upper_bound": 0,
         "buckets": [
            {
               "key": "xx",
               "doc_count": 283322,
               "children": {
                  "doc_count": 3740372,
                  "requests": {
                     "value": 5801652297
                  }
               }
            }
         ]
      }
   }

Result B:
"aggregations": {
      "MY_FIELD": {
         "doc_count_error_upper_bound": 0,
         "buckets": [
            {
               "key": "xx",
               "doc_count": 302421,
               "children": {
                  "doc_count": 1877361,
                  "requests": {
                     "value": 2965346170
                  }
               }
            }
         ]
      }
   }

The problem is that switching A to B back and forth is pretty stable and reproducible. 
ES logs are clear. 

Could someone help towards some ideas here?

Thank you!

Vlad

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2ce80724-b3d9-4d58-b54e-15727f999564%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ab610c8d-f85c-4967-aff1-7e79111fe71d%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Met vriendelijke groet,

Martijn van Groningen



--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BA76Ty0L0tDjtOxcJO-VDp7FOtnoJgjqB-p7HMZ4Tz37%3DkPrw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Children aggregation (1.4.0.Beta1) Round-Robin result

Vlad Vlaskin
Hi Martijn,

Would you help with another question considering this topic

I red that ES stores parent-child relations in a heap, could it be that this bug prevents some objects from being GC-ed, e.g. there is a memory leak? 
And what happens if there is no more heap but there are more parent-child relations incoming? 

The reason Im asking is that our cluster (8 rxlarge, etc etc) went down after 2 days updating paren-child relations. 
Index volume is tiny, but the number of child documents updated is huge. 

Thank you.

Vlad 


On Tuesday, October 21, 2014 4:38:55 PM UTC+2, Martijn v Groningen wrote:
Hi Vlad,

I opened: <a href="https://github.com/elasticsearch/elasticsearch/pull/8180" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fpull%2F8180\46sa\75D\46sntz\0751\46usg\75AFQjCNGjyf3YvK22RhWZk4kfBNBu2Haqrg';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fpull%2F8180\46sa\75D\46sntz\0751\46usg\75AFQjCNGjyf3YvK22RhWZk4kfBNBu2Haqrg';return true;">https://github.com/elasticsearch/elasticsearch/pull/8180

Many thanks for reporting this issue!
Besides this bug the parent/child model works well, so I recommend to keep it. I don't know exactly when the next 1.4 release is released, but I expect within a week or 2.

Martijn 


On 21 October 2014 16:17, Vlad Vlaskin <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="fhRDveP5hE4J" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">vl...@...> wrote:
Hi Martijn,

great news, thank you!

Would you recommend to keep parent-child data model and wait for a release?  (Do you have a feeling of the date?).

Thank you

Vlad



On Tuesday, October 21, 2014 4:01:47 PM UTC+2, Martijn v Groningen wrote:
Hi Vlad, 

I reproduced it. The children agg doesn't take documents marked as deleted into account properly.

When documents are deleted they are initially marked as deleted before they're removed from the index. This also applies to updates, because that translate into an index + delete. 

The issue you're experiencing can also happen when not using the bulk api. It may just be a bit less likely to manifest.

The fix for this bug is small. I'll open a PR soon.

Martijn

On 21 October 2014 15:51, Vlad Vlaskin <[hidden email]> wrote:
Hi Martijn,

Couple hours age I tried to submit a bug on ES Github issues and during creating steps of reproduce realized one more thing.

It happens only if you update the same child document within one bulk request.

Because I didn't manage to reproduce the "arithmetic progression" effect with curling my localhost, but it is still reproducible from java code doing bulk-update (script + upsert doc). 
I understand that bulk-updating the same document is a pretty ugly thing 
and I was surprised when it worked normally (without exceptions about version conflicts) from java client. 

If it might be helpful: these are the steps and queries to curl your localhost with parent-child.
Unfortunately I don't know how to create a curl with bulk updates. 


     #Create index "test" with parent-cild mappings
curl -XPUT localhost:9200/test -d '{"mappings":{"root":{"properties":{"country":{"type":"string"}}},"metric":{"_parent":{"type":"root"},"properties":{"count":{"type":"long"}}}}}'
 
#Index parent document:
curl -XPUT localhost:9200/test/root/1 -d '{"country":"de"}'

#Index child document:
curl -XPUT '<a href="http://localhost:9200/test/metric/1?parent=1" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Flocalhost%3A9200%2Ftest%2Fmetric%2F1%3Fparent%3D1\46sa\75D\46sntz\0751\46usg\75AFQjCNEYh6ijDPErb1C2foGXG0qLjt58Ug';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Flocalhost%3A9200%2Ftest%2Fmetric%2F1%3Fparent%3D1\46sa\75D\46sntz\0751\46usg\75AFQjCNEYh6ijDPErb1C2foGXG0qLjt58Ug';return true;">http://localhost:9200/test/metric/1?parent=1' -d '{"count":1}'
#Update child document:
curl -XPOST '<a href="http://localhost:9200/test/metric/1/_update?parent=1" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Flocalhost%3A9200%2Ftest%2Fmetric%2F1%2F_update%3Fparent%3D1\46sa\75D\46sntz\0751\46usg\75AFQjCNEPf_hTk3XL0nnrk3-ALrm0v-Rvlw';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Flocalhost%3A9200%2Ftest%2Fmetric%2F1%2F_update%3Fparent%3D1\46sa\75D\46sntz\0751\46usg\75AFQjCNEPf_hTk3XL0nnrk3-ALrm0v-Rvlw';return true;">http://localhost:9200/test/metric/1/_update?parent=1' -d '{"script":"ctx._source.count+=ct", "params":{"ct":1}}'
#Query with benchmark query, it should return 2
curl -XGET localhost:9200/test/_search -d '{"size":0,"query":{"match_all":{}},"aggs":{"requests":{"sum":{"field":"count"}}}}'
#Query with child aggregation query, exepected 2
curl -XGET localhost:9200/test/metric/_search -d '{"size":0,"query":{"match_all":{}},"aggs":{"child":{"children":{"type":"metric"},"aggs":{"requests":{"sum":{"field":"count"}}}}}}'


Thank you

On Tuesday, October 21, 2014 3:33:35 PM UTC+2, Martijn v Groningen wrote:
Hi Vlad,

What you're describing shouldn't happen. The child docs should get detached. I think this is a bug.
Let me verify and get back to you.

Martijn

On 21 October 2014 13:26, Vlad Vlaskin <[hidden email]> wrote:
After some experiments I believe I found the cause of the discrepancy problem:

ElasticSearch does not detach child object after it has been updated from parent child aggregation and uses it in child aggregation. 

E.g. I have my child updated 4 times with script (within batch update), and it has 4 versions:
{ "count": 1}, { "count": 2}, { "count": 3}, { "count": 4}

Query to the child document (after refresh) shows you proper version: {"count": 4}

But child aggregation {"sum":{"field":"count"}} shows you 10, because:

1 + 2 +3 +4 = 10

It works pretty accurate (e.g. for 5 you have 15). 

It explains the behavior here.





On Tuesday, October 21, 2014 3:18:47 AM UTC+2, Vlad Vlaskin wrote:
Dear ES group,
we've been using ES in production for a while and test eagerly all new-coming features such as cardinality and others.

We try data modeling with parent-child relations (ES version 1.4.0.Beta1, 8 nodes, EC2 r3.xlarge, ssd, lot ram etc.)
With data model of: 
Parent
{
  "key": "value"  
}

and a timeline with children, holding metrics:

Child (type "metrics")
{
 "day": "2014-10-20",
  "count: 10
}

We update metric documents and properly index them with script+upsert.
The problem is that the query below yields in 2 different results in round robin way. 
E.g. first time you call it you receive the first number, a second after you receive the second and again back to the first, etc. 

{
    "size": 0,
    "query": {
        "match_all": {}
    },
    "aggs": {
        "MY_FIELD": {
            "terms": {
                "field": "FIELD-XYZ"             // parent term aggregation 
            },
            "aggs": {
                "children": {
                    "children": {
                        "type": "metrics"        // child aggregation of type "metrics"
                    },
                    "aggs": {
                        "requests": {
                            "sum": {
                                "field": "count" // target aggregation within child documents
                            } 
                        }
                    }
                }
            }
        }
    }
}

 Result A: 
"aggregations": {
      "MY_FIELD": {
         "doc_count_error_upper_bound": 0,
         "buckets": [
            {
               "key": "xx",
               "doc_count": 283322,
               "children": {
                  "doc_count": 3740372,
                  "requests": {
                     "value": 5801652297
                  }
               }
            }
         ]
      }
   }

Result B:
"aggregations": {
      "MY_FIELD": {
         "doc_count_error_upper_bound": 0,
         "buckets": [
            {
               "key": "xx",
               "doc_count": 302421,
               "children": {
                  "doc_count": 1877361,
                  "requests": {
                     "value": 2965346170
                  }
               }
            }
         ]
      }
   }

The problem is that switching A to B back and forth is pretty stable and reproducible. 
ES logs are clear. 

Could someone help towards some ideas here?

Thank you!

Vlad

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/2ce80724-b3d9-4d58-b54e-15727f999564%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/2ce80724-b3d9-4d58-b54e-15727f999564%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/2ce80724-b3d9-4d58-b54e-15727f999564%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/2ce80724-b3d9-4d58-b54e-15727f999564%40googlegroups.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.



--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/ab610c8d-f85c-4967-aff1-7e79111fe71d%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/ab610c8d-f85c-4967-aff1-7e79111fe71d%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/ab610c8d-f85c-4967-aff1-7e79111fe71d%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/ab610c8d-f85c-4967-aff1-7e79111fe71d%40googlegroups.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.



--
Met vriendelijke groet,

Martijn van Groningen



--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/42a73156-f6fb-4e9d-b1da-2615710ea97d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Children aggregation (1.4.0.Beta1) Round-Robin result

Martijn v Groningen
I missed this email... The `children` agg relies on field data and that grows as your data grows. Technically field data is tied to the Lucene segments on each Lucene index (shard). As segments are created and removed so do the field data entries. For field data if there is no more heap available then the circuit breaker kicks and fails requests that try to load field data that don't fit into heap anymore.

How large is your index? Did you look into stats api how much heap memory field data is actually taking? (for example the node stats api tells this for each node in your cluster). 
Maybe something else is taking it, but that is difficult to tell without looking into the stats api outputs or looking at a heap dump.

Martijn

On 22 October 2014 11:00, Vlad Vlaskin <[hidden email]> wrote:
Hi Martijn,

Would you help with another question considering this topic

I red that ES stores parent-child relations in a heap, could it be that this bug prevents some objects from being GC-ed, e.g. there is a memory leak? 
And what happens if there is no more heap but there are more parent-child relations incoming? 

The reason Im asking is that our cluster (8 rxlarge, etc etc) went down after 2 days updating paren-child relations. 
Index volume is tiny, but the number of child documents updated is huge. 

Thank you.

Vlad 


On Tuesday, October 21, 2014 4:38:55 PM UTC+2, Martijn v Groningen wrote:
Hi Vlad,


Many thanks for reporting this issue!
Besides this bug the parent/child model works well, so I recommend to keep it. I don't know exactly when the next 1.4 release is released, but I expect within a week or 2.

Martijn 


On 21 October 2014 16:17, Vlad Vlaskin <[hidden email]> wrote:
Hi Martijn,

great news, thank you!

Would you recommend to keep parent-child data model and wait for a release?  (Do you have a feeling of the date?).

Thank you

Vlad



On Tuesday, October 21, 2014 4:01:47 PM UTC+2, Martijn v Groningen wrote:
Hi Vlad, 

I reproduced it. The children agg doesn't take documents marked as deleted into account properly.

When documents are deleted they are initially marked as deleted before they're removed from the index. This also applies to updates, because that translate into an index + delete. 

The issue you're experiencing can also happen when not using the bulk api. It may just be a bit less likely to manifest.

The fix for this bug is small. I'll open a PR soon.

Martijn

On 21 October 2014 15:51, Vlad Vlaskin <[hidden email]> wrote:
Hi Martijn,

Couple hours age I tried to submit a bug on ES Github issues and during creating steps of reproduce realized one more thing.

It happens only if you update the same child document within one bulk request.

Because I didn't manage to reproduce the "arithmetic progression" effect with curling my localhost, but it is still reproducible from java code doing bulk-update (script + upsert doc). 
I understand that bulk-updating the same document is a pretty ugly thing 
and I was surprised when it worked normally (without exceptions about version conflicts) from java client. 

If it might be helpful: these are the steps and queries to curl your localhost with parent-child.
Unfortunately I don't know how to create a curl with bulk updates. 


     #Create index "test" with parent-cild mappings
curl -XPUT localhost:9200/test -d '{"mappings":{"root":{"properties":{"country":{"type":"string"}}},"metric":{"_parent":{"type":"root"},"properties":{"count":{"type":"long"}}}}}'
 
#Index parent document:
curl -XPUT localhost:9200/test/root/1 -d '{"country":"de"}'

#Index child document:
curl -XPUT 'http://localhost:9200/test/metric/1?parent=1' -d '{"count":1}'
#Update child document:
curl -XPOST 'http://localhost:9200/test/metric/1/_update?parent=1' -d '{"script":"ctx._source.count+=ct", "params":{"ct":1}}'
#Query with benchmark query, it should return 2
curl -XGET localhost:9200/test/_search -d '{"size":0,"query":{"match_all":{}},"aggs":{"requests":{"sum":{"field":"count"}}}}'
#Query with child aggregation query, exepected 2
curl -XGET localhost:9200/test/metric/_search -d '{"size":0,"query":{"match_all":{}},"aggs":{"child":{"children":{"type":"metric"},"aggs":{"requests":{"sum":{"field":"count"}}}}}}'


Thank you

On Tuesday, October 21, 2014 3:33:35 PM UTC+2, Martijn v Groningen wrote:
Hi Vlad,

What you're describing shouldn't happen. The child docs should get detached. I think this is a bug.
Let me verify and get back to you.

Martijn

On 21 October 2014 13:26, Vlad Vlaskin <[hidden email]> wrote:
After some experiments I believe I found the cause of the discrepancy problem:

ElasticSearch does not detach child object after it has been updated from parent child aggregation and uses it in child aggregation. 

E.g. I have my child updated 4 times with script (within batch update), and it has 4 versions:
{ "count": 1}, { "count": 2}, { "count": 3}, { "count": 4}

Query to the child document (after refresh) shows you proper version: {"count": 4}

But child aggregation {"sum":{"field":"count"}} shows you 10, because:

1 + 2 +3 +4 = 10

It works pretty accurate (e.g. for 5 you have 15). 

It explains the behavior here.





On Tuesday, October 21, 2014 3:18:47 AM UTC+2, Vlad Vlaskin wrote:
Dear ES group,
we've been using ES in production for a while and test eagerly all new-coming features such as cardinality and others.

We try data modeling with parent-child relations (ES version 1.4.0.Beta1, 8 nodes, EC2 r3.xlarge, ssd, lot ram etc.)
With data model of: 
Parent
{
  "key": "value"  
}

and a timeline with children, holding metrics:

Child (type "metrics")
{
 "day": "2014-10-20",
  "count: 10
}

We update metric documents and properly index them with script+upsert.
The problem is that the query below yields in 2 different results in round robin way. 
E.g. first time you call it you receive the first number, a second after you receive the second and again back to the first, etc. 

{
    "size": 0,
    "query": {
        "match_all": {}
    },
    "aggs": {
        "MY_FIELD": {
            "terms": {
                "field": "FIELD-XYZ"             // parent term aggregation 
            },
            "aggs": {
                "children": {
                    "children": {
                        "type": "metrics"        // child aggregation of type "metrics"
                    },
                    "aggs": {
                        "requests": {
                            "sum": {
                                "field": "count" // target aggregation within child documents
                            } 
                        }
                    }
                }
            }
        }
    }
}

 Result A: 
"aggregations": {
      "MY_FIELD": {
         "doc_count_error_upper_bound": 0,
         "buckets": [
            {
               "key": "xx",
               "doc_count": 283322,
               "children": {
                  "doc_count": 3740372,
                  "requests": {
                     "value": 5801652297
                  }
               }
            }
         ]
      }
   }

Result B:
"aggregations": {
      "MY_FIELD": {
         "doc_count_error_upper_bound": 0,
         "buckets": [
            {
               "key": "xx",
               "doc_count": 302421,
               "children": {
                  "doc_count": 1877361,
                  "requests": {
                     "value": 2965346170
                  }
               }
            }
         ]
      }
   }

The problem is that switching A to B back and forth is pretty stable and reproducible. 
ES logs are clear. 

Could someone help towards some ideas here?

Thank you!

Vlad

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2ce80724-b3d9-4d58-b54e-15727f999564%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ab610c8d-f85c-4967-aff1-7e79111fe71d%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Met vriendelijke groet,

Martijn van Groningen



--
Met vriendelijke groet,

Martijn van Groningen



--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BA76Ty%3DPrRXu-HTR%2ByadttaG1XaaK6goZwy80Av9RYNJ8jQPQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.