Parent/Child query performance in version 1.1.2

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Parent/Child query performance in version 1.1.2

mark-2
We are experiencing slow parent/child queries even when we run the query a second time and I wanted to know if this is just the limit of this feature within ElasticSearch. According to the ES Docs (http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/parent-child-performance.html) parent/child queries can be 5-10x slower and consume a lot of memory. 

My impression has been that as long as we give ES enough memory via the field data cache, subsequent queries would be quicker than the first time it is executed. We are seeing the following query take ~16 seconds to complete every time. 


{
    "from": 0,
    "size": 100,
    "query": {
        "filtered": {
            "query": {
                "match_all": {}
            },
            "filter": {
                "bool": {
                    "must": [
                        {
                            "term": {
                                "oid": 61
                            }
                        },
                        {
                            "has_child": {
                                "type": "social",
                                "query": {
                                    "bool": {
                                        "should": [
                                            {
                                                "term": {
                                                    "engagement.type": "like"
                                                }
                                            },
                                            {
                                                "term": {
                                                    "content.remote_id": "20697868961_10152270678178962"
                                                }
                                            }
                                        ]
                                    }
                                }
                            }
                        }
                    ]
                }
            }
        }
    },
    "fields": "id",
    "sort": [
        {
            "_score": {}
        },
        {
            "id": {
                "order": "asc"
            }
        }
    ]
}


The index (which has 5 shards with 1 replica shard) we are testing this on has 2.2 million parent documents and 1.1 million child documents.

We are running our two data nodes on r3.2xlarge's which have 8 CPU's, 60GB of RAM, and SSD.

Our ES data nodes have 30G of heap and the field data cache is only consuming around ~3GB right now and there are no cache evictions. The field data cache is also allowed to grow to 75% of the available heap.

I'm looking to understand if this is a limitation with parent/child or is there additional configuration that has to be set beyond the defaults that would help speed these queries up?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9563cc90-21df-42ca-9eb1-aab4520db871%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Parent/Child query performance in version 1.1.2

mark-2
I wanted to update the list with an interesting piece of information. We found that when we took one of our two data nodes out of the cluster, leaving just one data node with no replicas, the query performance increased dramatically. The queries are now returning in <100ms on subsequent executions which is what we'd expect to see as a result of the data being stored in the field data cache. 

Is it possible that there is some kind of inefficient code path when a query is spread across primary and replica shards?

On Thursday, August 21, 2014 3:53:40 PM UTC-4, Mark Greene wrote:
We are experiencing slow parent/child queries even when we run the query a second time and I wanted to know if this is just the limit of this feature within ElasticSearch. According to the ES Docs (<a href="http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/parent-child-performance.html" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.elasticsearch.org%2Fguide%2Fen%2Felasticsearch%2Fguide%2Fcurrent%2Fparent-child-performance.html\46sa\75D\46sntz\0751\46usg\75AFQjCNEgTSDtw48_-rqOnOvawNsPcMUNWQ';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.elasticsearch.org%2Fguide%2Fen%2Felasticsearch%2Fguide%2Fcurrent%2Fparent-child-performance.html\46sa\75D\46sntz\0751\46usg\75AFQjCNEgTSDtw48_-rqOnOvawNsPcMUNWQ';return true;">http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/parent-child-performance.html) parent/child queries can be 5-10x slower and consume a lot of memory. 

My impression has been that as long as we give ES enough memory via the field data cache, subsequent queries would be quicker than the first time it is executed. We are seeing the following query take ~16 seconds to complete every time. 


{
    "from": 0,
    "size": 100,
    "query": {
        "filtered": {
            "query": {
                "match_all": {}
            },
            "filter": {
                "bool": {
                    "must": [
                        {
                            "term": {
                                "oid": 61
                            }
                        },
                        {
                            "has_child": {
                                "type": "social",
                                "query": {
                                    "bool": {
                                        "should": [
                                            {
                                                "term": {
                                                    "engagement.type": "like"
                                                }
                                            },
                                            {
                                                "term": {
                                                    "content.remote_id": "20697868961_10152270678178962"
                                                }
                                            }
                                        ]
                                    }
                                }
                            }
                        }
                    ]
                }
            }
        }
    },
    "fields": "id",
    "sort": [
        {
            "_score": {}
        },
        {
            "id": {
                "order": "asc"
            }
        }
    ]
}


The index (which has 5 shards with 1 replica shard) we are testing this on has 2.2 million parent documents and 1.1 million child documents.

We are running our two data nodes on r3.2xlarge's which have 8 CPU's, 60GB of RAM, and SSD.

Our ES data nodes have 30G of heap and the field data cache is only consuming around ~3GB right now and there are no cache evictions. The field data cache is also allowed to grow to 75% of the available heap.

I'm looking to understand if this is a limitation with parent/child or is there additional configuration that has to be set beyond the defaults that would help speed these queries up?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a6442545-edc0-4e21-9696-925aae517762%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Parent/Child query performance in version 1.1.2

Adrien Grand-2
Hi Mark,

Given that you had 1 replica in your first setup, it could take several queries to warm up the field data cache completely, does the query still take 16 seconds to run if you run it eg. 10 times? (3 should be enough, but just to be sure)

Does it change anything if you query elasticsearch with preference=_local? This should be equivalent to your single-node setup, so it would be interesting to see if that changes something.

As a side note, you might want to try out a more recent version of Elasticsearch since parent/child performance improved quite significantly in 1.2.0 because of https://github.com/elasticsearch/elasticsearch/pull/5846



On Fri, Aug 22, 2014 at 11:15 PM, Mark Greene <[hidden email]> wrote:
I wanted to update the list with an interesting piece of information. We found that when we took one of our two data nodes out of the cluster, leaving just one data node with no replicas, the query performance increased dramatically. The queries are now returning in <100ms on subsequent executions which is what we'd expect to see as a result of the data being stored in the field data cache. 

Is it possible that there is some kind of inefficient code path when a query is spread across primary and replica shards?


On Thursday, August 21, 2014 3:53:40 PM UTC-4, Mark Greene wrote:
We are experiencing slow parent/child queries even when we run the query a second time and I wanted to know if this is just the limit of this feature within ElasticSearch. According to the ES Docs (http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/parent-child-performance.html) parent/child queries can be 5-10x slower and consume a lot of memory. 

My impression has been that as long as we give ES enough memory via the field data cache, subsequent queries would be quicker than the first time it is executed. We are seeing the following query take ~16 seconds to complete every time. 


{
    "from": 0,
    "size": 100,
    "query": {
        "filtered": {
            "query": {
                "match_all": {}
            },
            "filter": {
                "bool": {
                    "must": [
                        {
                            "term": {
                                "oid": 61
                            }
                        },
                        {
                            "has_child": {
                                "type": "social",
                                "query": {
                                    "bool": {
                                        "should": [
                                            {
                                                "term": {
                                                    "engagement.type": "like"
                                                }
                                            },
                                            {
                                                "term": {
                                                    "content.remote_id": "20697868961_10152270678178962"
                                                }
                                            }
                                        ]
                                    }
                                }
                            }
                        }
                    ]
                }
            }
        }
    },
    "fields": "id",
    "sort": [
        {
            "_score": {}
        },
        {
            "id": {
                "order": "asc"
            }
        }
    ]
}


The index (which has 5 shards with 1 replica shard) we are testing this on has 2.2 million parent documents and 1.1 million child documents.

We are running our two data nodes on r3.2xlarge's which have 8 CPU's, 60GB of RAM, and SSD.

Our ES data nodes have 30G of heap and the field data cache is only consuming around ~3GB right now and there are no cache evictions. The field data cache is also allowed to grow to 75% of the available heap.

I'm looking to understand if this is a limitation with parent/child or is there additional configuration that has to be set beyond the defaults that would help speed these queries up?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a6442545-edc0-4e21-9696-925aae517762%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j40JEVSDxdJ9o93bLBUeHGj7c7X6fyf2y62Bpo9kML6AQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Parent/Child query performance in version 1.1.2

mark-2
Hi Adrien,

Thanks for reaching out.

We actually were exited to see the performance improvements stated in the 1.2.0 release notes so we upgraded to 1.3.2. We saw some performance improvement but it wasn't orders of magnitude and queries are still running very slow.

We also tried your suggestion of using the 'preference=_local' query param but we didn't see any difference there. Additionally, running the query 10 times, we saw no improvement in speed.

Currently, the only major performance increase we've seen with parent/child queries is dropping down to 1 data node, at which, we see queries executing well under the 100ms mark.




On Friday, August 22, 2014 6:42:27 PM UTC-4, Adrien Grand wrote:
Hi Mark,

Given that you had 1 replica in your first setup, it could take several queries to warm up the field data cache completely, does the query still take 16 seconds to run if you run it eg. 10 times? (3 should be enough, but just to be sure)

Does it change anything if you query elasticsearch with preference=_local? This should be equivalent to your single-node setup, so it would be interesting to see if that changes something.

As a side note, you might want to try out a more recent version of Elasticsearch since parent/child performance improved quite significantly in 1.2.0 because of <a href="https://github.com/elasticsearch/elasticsearch/pull/5846" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fpull%2F5846\46sa\75D\46sntz\0751\46usg\75AFQjCNH7jOTOGCq1ooV2lTcQxUqf6hZg5g';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fpull%2F5846\46sa\75D\46sntz\0751\46usg\75AFQjCNH7jOTOGCq1ooV2lTcQxUqf6hZg5g';return true;">https://github.com/elasticsearch/elasticsearch/pull/5846



On Fri, Aug 22, 2014 at 11:15 PM, Mark Greene <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="v9X_pZjKgG4J" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">ma...@...> wrote:
I wanted to update the list with an interesting piece of information. We found that when we took one of our two data nodes out of the cluster, leaving just one data node with no replicas, the query performance increased dramatically. The queries are now returning in <100ms on subsequent executions which is what we'd expect to see as a result of the data being stored in the field data cache. 

Is it possible that there is some kind of inefficient code path when a query is spread across primary and replica shards?


On Thursday, August 21, 2014 3:53:40 PM UTC-4, Mark Greene wrote:
We are experiencing slow parent/child queries even when we run the query a second time and I wanted to know if this is just the limit of this feature within ElasticSearch. According to the ES Docs (<a href="http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/parent-child-performance.html" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.elasticsearch.org%2Fguide%2Fen%2Felasticsearch%2Fguide%2Fcurrent%2Fparent-child-performance.html\46sa\75D\46sntz\0751\46usg\75AFQjCNEgTSDtw48_-rqOnOvawNsPcMUNWQ';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.elasticsearch.org%2Fguide%2Fen%2Felasticsearch%2Fguide%2Fcurrent%2Fparent-child-performance.html\46sa\75D\46sntz\0751\46usg\75AFQjCNEgTSDtw48_-rqOnOvawNsPcMUNWQ';return true;">http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/parent-child-performance.html) parent/child queries can be 5-10x slower and consume a lot of memory. 

My impression has been that as long as we give ES enough memory via the field data cache, subsequent queries would be quicker than the first time it is executed. We are seeing the following query take ~16 seconds to complete every time. 


{
    "from": 0,
    "size": 100,
    "query": {
        "filtered": {
            "query": {
                "match_all": {}
            },
            "filter": {
                "bool": {
                    "must": [
                        {
                            "term": {
                                "oid": 61
                            }
                        },
                        {
                            "has_child": {
                                "type": "social",
                                "query": {
                                    "bool": {
                                        "should": [
                                            {
                                                "term": {
                                                    "engagement.type": "like"
                                                }
                                            },
                                            {
                                                "term": {
                                                    "content.remote_id": "20697868961_10152270678178962"
                                                }
                                            }
                                        ]
                                    }
                                }
                            }
                        }
                    ]
                }
            }
        }
    },
    "fields": "id",
    "sort": [
        {
            "_score": {}
        },
        {
            "id": {
                "order": "asc"
            }
        }
    ]
}


The index (which has 5 shards with 1 replica shard) we are testing this on has 2.2 million parent documents and 1.1 million child documents.

We are running our two data nodes on r3.2xlarge's which have 8 CPU's, 60GB of RAM, and SSD.

Our ES data nodes have 30G of heap and the field data cache is only consuming around ~3GB right now and there are no cache evictions. The field data cache is also allowed to grow to 75% of the available heap.

I'm looking to understand if this is a limitation with parent/child or is there additional configuration that has to be set beyond the defaults that would help speed these queries up?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="v9X_pZjKgG4J" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/a6442545-edc0-4e21-9696-925aae517762%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/a6442545-edc0-4e21-9696-925aae517762%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/a6442545-edc0-4e21-9696-925aae517762%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/a6442545-edc0-4e21-9696-925aae517762%40googlegroups.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.



--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/87aff37a-04be-472d-88fa-5fe6c6a3f5a7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Parent/Child query performance in version 1.1.2

Clinton Gormley-2
Something else to note: parent-child now uses global ordinals to make queries 3x faster than they were previously, but global ordinals need to be rebuilt after the index has refreshed (assuming some data has changed).

Currently there is no way to refresh p/c global ordinals "eagerly" (ie during the refresh phase) and so it happens on the first query after a refresh.  1.3.3 and 1.4.0 will include an option to allow eager building of global ordinals which should remove this latency spike: https://github.com/elasticsearch/elasticsearch/issues/7394

You may want to consider increasing the refresh_interval so that global ordinals remain valid for longer.


On 25 August 2014 16:48, Mark Greene <[hidden email]> wrote:
Hi Adrien,

Thanks for reaching out.

We actually were exited to see the performance improvements stated in the 1.2.0 release notes so we upgraded to 1.3.2. We saw some performance improvement but it wasn't orders of magnitude and queries are still running very slow.

We also tried your suggestion of using the 'preference=_local' query param but we didn't see any difference there. Additionally, running the query 10 times, we saw no improvement in speed.

Currently, the only major performance increase we've seen with parent/child queries is dropping down to 1 data node, at which, we see queries executing well under the 100ms mark.




On Friday, August 22, 2014 6:42:27 PM UTC-4, Adrien Grand wrote:
Hi Mark,

Given that you had 1 replica in your first setup, it could take several queries to warm up the field data cache completely, does the query still take 16 seconds to run if you run it eg. 10 times? (3 should be enough, but just to be sure)

Does it change anything if you query elasticsearch with preference=_local? This should be equivalent to your single-node setup, so it would be interesting to see if that changes something.

As a side note, you might want to try out a more recent version of Elasticsearch since parent/child performance improved quite significantly in 1.2.0 because of https://github.com/elasticsearch/elasticsearch/pull/5846



On Fri, Aug 22, 2014 at 11:15 PM, Mark Greene <[hidden email]> wrote:
I wanted to update the list with an interesting piece of information. We found that when we took one of our two data nodes out of the cluster, leaving just one data node with no replicas, the query performance increased dramatically. The queries are now returning in <100ms on subsequent executions which is what we'd expect to see as a result of the data being stored in the field data cache. 

Is it possible that there is some kind of inefficient code path when a query is spread across primary and replica shards?


On Thursday, August 21, 2014 3:53:40 PM UTC-4, Mark Greene wrote:
We are experiencing slow parent/child queries even when we run the query a second time and I wanted to know if this is just the limit of this feature within ElasticSearch. According to the ES Docs (http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/parent-child-performance.html) parent/child queries can be 5-10x slower and consume a lot of memory. 

My impression has been that as long as we give ES enough memory via the field data cache, subsequent queries would be quicker than the first time it is executed. We are seeing the following query take ~16 seconds to complete every time. 


{
    "from": 0,
    "size": 100,
    "query": {
        "filtered": {
            "query": {
                "match_all": {}
            },
            "filter": {
                "bool": {
                    "must": [
                        {
                            "term": {
                                "oid": 61
                            }
                        },
                        {
                            "has_child": {
                                "type": "social",
                                "query": {
                                    "bool": {
                                        "should": [
                                            {
                                                "term": {
                                                    "engagement.type": "like"
                                                }
                                            },
                                            {
                                                "term": {
                                                    "content.remote_id": "20697868961_10152270678178962"
                                                }
                                            }
                                        ]
                                    }
                                }
                            }
                        }
                    ]
                }
            }
        }
    },
    "fields": "id",
    "sort": [
        {
            "_score": {}
        },
        {
            "id": {
                "order": "asc"
            }
        }
    ]
}


The index (which has 5 shards with 1 replica shard) we are testing this on has 2.2 million parent documents and 1.1 million child documents.

We are running our two data nodes on r3.2xlarge's which have 8 CPU's, 60GB of RAM, and SSD.

Our ES data nodes have 30G of heap and the field data cache is only consuming around ~3GB right now and there are no cache evictions. The field data cache is also allowed to grow to 75% of the available heap.

I'm looking to understand if this is a limitation with parent/child or is there additional configuration that has to be set beyond the defaults that would help speed these queries up?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.



--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/87aff37a-04be-472d-88fa-5fe6c6a3f5a7%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPt3XKQ164swXT7iH%2BomK1rviZT-ChX4kOSXTe%3DmxY0VqsGxCQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Parent/Child query performance in version 1.1.2

mark-2
Hey Clinton,

Thanks for the heads up on what's on the horizon. That definitely sounds like a drastic improvement. That being said, my fear here is that even with that improvement, this data model (parent/child) doesn't seem to that performant with a moderate amount of documents. In order for us to really adopt this methodology of using parent/child, we'd expect to see sub 100ms performance so long as we were feeding ES with enough RAM. 

My hunch here is there must be some code path that is hit when running on more than 1 data node that either doesn't write to the cache or skips it on the read and hits the disk. We don't have a ton of load on our data nodes, CPU is well under 30% and IOWait is usually under 0.30.

Just to reiterate, when we run the parent/child query on one data node, it runs in less than 100ms, when it runs across two data nodes, it's >10s. This is being experienced on version 1.1.2 and 1.3.2.

On Monday, August 25, 2014 10:55:15 AM UTC-4, Clinton Gormley wrote:
Something else to note: parent-child now uses global ordinals to make queries 3x faster than they were previously, but global ordinals need to be rebuilt after the index has refreshed (assuming some data has changed).

Currently there is no way to refresh p/c global ordinals "eagerly" (ie during the refresh phase) and so it happens on the first query after a refresh.  1.3.3 and 1.4.0 will include an option to allow eager building of global ordinals which should remove this latency spike: <a href="https://github.com/elasticsearch/elasticsearch/issues/7394" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fissues%2F7394\46sa\75D\46sntz\0751\46usg\75AFQjCNGNi1wmb_wUxmnER71flz89FW_W0Q';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fissues%2F7394\46sa\75D\46sntz\0751\46usg\75AFQjCNGNi1wmb_wUxmnER71flz89FW_W0Q';return true;">https://github.com/elasticsearch/elasticsearch/issues/7394

You may want to consider increasing the refresh_interval so that global ordinals remain valid for longer.


On 25 August 2014 16:48, Mark Greene <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="Jm1L-fs34XsJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">ma...@...> wrote:
Hi Adrien,

Thanks for reaching out.

We actually were exited to see the performance improvements stated in the 1.2.0 release notes so we upgraded to 1.3.2. We saw some performance improvement but it wasn't orders of magnitude and queries are still running very slow.

We also tried your suggestion of using the 'preference=_local' query param but we didn't see any difference there. Additionally, running the query 10 times, we saw no improvement in speed.

Currently, the only major performance increase we've seen with parent/child queries is dropping down to 1 data node, at which, we see queries executing well under the 100ms mark.




On Friday, August 22, 2014 6:42:27 PM UTC-4, Adrien Grand wrote:
Hi Mark,

Given that you had 1 replica in your first setup, it could take several queries to warm up the field data cache completely, does the query still take 16 seconds to run if you run it eg. 10 times? (3 should be enough, but just to be sure)

Does it change anything if you query elasticsearch with preference=_local? This should be equivalent to your single-node setup, so it would be interesting to see if that changes something.

As a side note, you might want to try out a more recent version of Elasticsearch since parent/child performance improved quite significantly in 1.2.0 because of <a href="https://github.com/elasticsearch/elasticsearch/pull/5846" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fpull%2F5846\46sa\75D\46sntz\0751\46usg\75AFQjCNH7jOTOGCq1ooV2lTcQxUqf6hZg5g';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fpull%2F5846\46sa\75D\46sntz\0751\46usg\75AFQjCNH7jOTOGCq1ooV2lTcQxUqf6hZg5g';return true;">https://github.com/elasticsearch/elasticsearch/pull/5846



On Fri, Aug 22, 2014 at 11:15 PM, Mark Greene <[hidden email]> wrote:
I wanted to update the list with an interesting piece of information. We found that when we took one of our two data nodes out of the cluster, leaving just one data node with no replicas, the query performance increased dramatically. The queries are now returning in <100ms on subsequent executions which is what we'd expect to see as a result of the data being stored in the field data cache. 

Is it possible that there is some kind of inefficient code path when a query is spread across primary and replica shards?


On Thursday, August 21, 2014 3:53:40 PM UTC-4, Mark Greene wrote:
We are experiencing slow parent/child queries even when we run the query a second time and I wanted to know if this is just the limit of this feature within ElasticSearch. According to the ES Docs (<a href="http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/parent-child-performance.html" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.elasticsearch.org%2Fguide%2Fen%2Felasticsearch%2Fguide%2Fcurrent%2Fparent-child-performance.html\46sa\75D\46sntz\0751\46usg\75AFQjCNEgTSDtw48_-rqOnOvawNsPcMUNWQ';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.elasticsearch.org%2Fguide%2Fen%2Felasticsearch%2Fguide%2Fcurrent%2Fparent-child-performance.html\46sa\75D\46sntz\0751\46usg\75AFQjCNEgTSDtw48_-rqOnOvawNsPcMUNWQ';return true;">http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/parent-child-performance.html) parent/child queries can be 5-10x slower and consume a lot of memory. 

My impression has been that as long as we give ES enough memory via the field data cache, subsequent queries would be quicker than the first time it is executed. We are seeing the following query take ~16 seconds to complete every time. 


{
    "from": 0,
    "size": 100,
    "query": {
        "filtered": {
            "query": {
                "match_all": {}
            },
            "filter": {
                "bool": {
                    "must": [
                        {
                            "term": {
                                "oid": 61
                            }
                        },
                        {
                            "has_child": {
                                "type": "social",
                                "query": {
                                    "bool": {
                                        "should": [
                                            {
                                                "term": {
                                                    "engagement.type": "like"
                                                }
                                            },
                                            {
                                                "term": {
                                                    "content.remote_id": "20697868961_10152270678178962"
                                                }
                                            }
                                        ]
                                    }
                                }
                            }
                        }
                    ]
                }
            }
        }
    },
    "fields": "id",
    "sort": [
        {
            "_score": {}
        },
        {
            "id": {
                "order": "asc"
            }
        }
    ]
}


The index (which has 5 shards with 1 replica shard) we are testing this on has 2.2 million parent documents and 1.1 million child documents.

We are running our two data nodes on r3.2xlarge's which have 8 CPU's, 60GB of RAM, and SSD.

Our ES data nodes have 30G of heap and the field data cache is only consuming around ~3GB right now and there are no cache evictions. The field data cache is also allowed to grow to 75% of the available heap.

I'm looking to understand if this is a limitation with parent/child or is there additional configuration that has to be set beyond the defaults that would help speed these queries up?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/a6442545-edc0-4e21-9696-925aae517762%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/a6442545-edc0-4e21-9696-925aae517762%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/a6442545-edc0-4e21-9696-925aae517762%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/a6442545-edc0-4e21-9696-925aae517762%40googlegroups.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.



--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="Jm1L-fs34XsJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/87aff37a-04be-472d-88fa-5fe6c6a3f5a7%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/87aff37a-04be-472d-88fa-5fe6c6a3f5a7%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/87aff37a-04be-472d-88fa-5fe6c6a3f5a7%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/87aff37a-04be-472d-88fa-5fe6c6a3f5a7%40googlegroups.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/46ceeedd-3da6-4962-8902-73decf3700bc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Parent/Child query performance in version 1.1.2

mark-2
Just wanted to close the loop on this in case anyone stumbled upon the same issue.

After upgrading to version 1.3.2 which had the performance increase stemming from https://github.com/elasticsearch/elasticsearch/pull/5846, we were able to see a dramatic decrease in parent/child query latency. We're executing queries under 150ms which is manageable for now and will be eagerly awaiting further improvements from the work Clinton highlighted here: https://github.com/elasticsearch/elasticsearch/issues/7394.

Along the way in our testing we got a little confused as we attempted to do our troubleshooting on 1 data node in order to keep things simple, this manifested in some misplaced assumptions around the performance increases that came from work released in 1.2.0. In our testing on a single node, we did _not_ observe a latency decrease at all when going from 1.1.2 to 1.3.2. However, when we changed our test cluster to use two data nodes, we saw a huge improvement. So my earlier assertion around not seeing those improvements in version 1.3.2 was incorrect although I'm still confused as to why a single node configuration was not benefiting.

In any case, wanted to thank the ES developers for being generous with their time helping us track this issue down. Now that I realize the incredible pace in which ES versions are released, we'll be much more vigilant about keeping up.

Thanks again!


On Monday, August 25, 2014 11:32:38 AM UTC-4, Mark Greene wrote:
Hey Clinton,

Thanks for the heads up on what's on the horizon. That definitely sounds like a drastic improvement. That being said, my fear here is that even with that improvement, this data model (parent/child) doesn't seem to that performant with a moderate amount of documents. In order for us to really adopt this methodology of using parent/child, we'd expect to see sub 100ms performance so long as we were feeding ES with enough RAM. 

My hunch here is there must be some code path that is hit when running on more than 1 data node that either doesn't write to the cache or skips it on the read and hits the disk. We don't have a ton of load on our data nodes, CPU is well under 30% and IOWait is usually under 0.30.

Just to reiterate, when we run the parent/child query on one data node, it runs in less than 100ms, when it runs across two data nodes, it's >10s. This is being experienced on version 1.1.2 and 1.3.2.

On Monday, August 25, 2014 10:55:15 AM UTC-4, Clinton Gormley wrote:
Something else to note: parent-child now uses global ordinals to make queries 3x faster than they were previously, but global ordinals need to be rebuilt after the index has refreshed (assuming some data has changed).

Currently there is no way to refresh p/c global ordinals "eagerly" (ie during the refresh phase) and so it happens on the first query after a refresh.  1.3.3 and 1.4.0 will include an option to allow eager building of global ordinals which should remove this latency spike: <a href="https://github.com/elasticsearch/elasticsearch/issues/7394" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fissues%2F7394\46sa\75D\46sntz\0751\46usg\75AFQjCNGNi1wmb_wUxmnER71flz89FW_W0Q';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fissues%2F7394\46sa\75D\46sntz\0751\46usg\75AFQjCNGNi1wmb_wUxmnER71flz89FW_W0Q';return true;">https://github.com/elasticsearch/elasticsearch/issues/7394

You may want to consider increasing the refresh_interval so that global ordinals remain valid for longer.


On 25 August 2014 16:48, Mark Greene <[hidden email]> wrote:
Hi Adrien,

Thanks for reaching out.

We actually were exited to see the performance improvements stated in the 1.2.0 release notes so we upgraded to 1.3.2. We saw some performance improvement but it wasn't orders of magnitude and queries are still running very slow.

We also tried your suggestion of using the 'preference=_local' query param but we didn't see any difference there. Additionally, running the query 10 times, we saw no improvement in speed.

Currently, the only major performance increase we've seen with parent/child queries is dropping down to 1 data node, at which, we see queries executing well under the 100ms mark.




On Friday, August 22, 2014 6:42:27 PM UTC-4, Adrien Grand wrote:
Hi Mark,

Given that you had 1 replica in your first setup, it could take several queries to warm up the field data cache completely, does the query still take 16 seconds to run if you run it eg. 10 times? (3 should be enough, but just to be sure)

Does it change anything if you query elasticsearch with preference=_local? This should be equivalent to your single-node setup, so it would be interesting to see if that changes something.

As a side note, you might want to try out a more recent version of Elasticsearch since parent/child performance improved quite significantly in 1.2.0 because of <a href="https://github.com/elasticsearch/elasticsearch/pull/5846" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fpull%2F5846\46sa\75D\46sntz\0751\46usg\75AFQjCNH7jOTOGCq1ooV2lTcQxUqf6hZg5g';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fpull%2F5846\46sa\75D\46sntz\0751\46usg\75AFQjCNH7jOTOGCq1ooV2lTcQxUqf6hZg5g';return true;">https://github.com/elasticsearch/elasticsearch/pull/5846



On Fri, Aug 22, 2014 at 11:15 PM, Mark Greene <[hidden email]> wrote:
I wanted to update the list with an interesting piece of information. We found that when we took one of our two data nodes out of the cluster, leaving just one data node with no replicas, the query performance increased dramatically. The queries are now returning in <100ms on subsequent executions which is what we'd expect to see as a result of the data being stored in the field data cache. 

Is it possible that there is some kind of inefficient code path when a query is spread across primary and replica shards?


On Thursday, August 21, 2014 3:53:40 PM UTC-4, Mark Greene wrote:
We are experiencing slow parent/child queries even when we run the query a second time and I wanted to know if this is just the limit of this feature within ElasticSearch. According to the ES Docs (<a href="http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/parent-child-performance.html" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.elasticsearch.org%2Fguide%2Fen%2Felasticsearch%2Fguide%2Fcurrent%2Fparent-child-performance.html\46sa\75D\46sntz\0751\46usg\75AFQjCNEgTSDtw48_-rqOnOvawNsPcMUNWQ';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.elasticsearch.org%2Fguide%2Fen%2Felasticsearch%2Fguide%2Fcurrent%2Fparent-child-performance.html\46sa\75D\46sntz\0751\46usg\75AFQjCNEgTSDtw48_-rqOnOvawNsPcMUNWQ';return true;">http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/parent-child-performance.html) parent/child queries can be 5-10x slower and consume a lot of memory. 

My impression has been that as long as we give ES enough memory via the field data cache, subsequent queries would be quicker than the first time it is executed. We are seeing the following query take ~16 seconds to complete every time. 


{
    "from": 0,
    "size": 100,
    "query": {
        "filtered": {
            "query": {
                "match_all": {}
            },
            "filter": {
                "bool": {
                    "must": [
                        {
                            "term": {
                                "oid": 61
                            }
                        },
                        {
                            "has_child": {
                                "type": "social",
                                "query": {
                                    "bool": {
                                        "should": [
                                            {
                                                "term": {
                                                    "engagement.type": "like"
                                                }
                                            },
                                            {
                                                "term": {
                                                    "content.remote_id": "20697868961_10152270678178962"
                                                }
                                            }
                                        ]
                                    }
                                }
                            }
                        }
                    ]
                }
            }
        }
    },
    "fields": "id",
    "sort": [
        {
            "_score": {}
        },
        {
            "id": {
                "order": "asc"
            }
        }
    ]
}


The index (which has 5 shards with 1 replica shard) we are testing this on has 2.2 million parent documents and 1.1 million child documents.

We are running our two data nodes on r3.2xlarge's which have 8 CPU's, 60GB of RAM, and SSD.

Our ES data nodes have 30G of heap and the field data cache is only consuming around ~3GB right now and there are no cache evictions. The field data cache is also allowed to grow to 75% of the available heap.

I'm looking to understand if this is a limitation with parent/child or is there additional configuration that has to be set beyond the defaults that would help speed these queries up?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/a6442545-edc0-4e21-9696-925aae517762%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/a6442545-edc0-4e21-9696-925aae517762%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/a6442545-edc0-4e21-9696-925aae517762%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/a6442545-edc0-4e21-9696-925aae517762%40googlegroups.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.



--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/87aff37a-04be-472d-88fa-5fe6c6a3f5a7%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/87aff37a-04be-472d-88fa-5fe6c6a3f5a7%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/87aff37a-04be-472d-88fa-5fe6c6a3f5a7%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/87aff37a-04be-472d-88fa-5fe6c6a3f5a7%40googlegroups.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d411fc6c-ec67-44c8-a775-2192d2917650%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.