Elasticsearch query performance

classic Classic list List threaded Threaded
15 messages Options
VB
Reply | Threaded
Open this post in threaded view
|

Elasticsearch query performance

VB
I'm using elasticsearch to index two types of objects -

Data details -

Contract object ~ 60 properties (Object size - 120 bytes)
Risk Item Object ~ 125 properties (Object size - 250 bytes)

Contract is parent of risk item (_parent)

I'm storing 240 million such objects in single index (210 million risk items, 30 million contracts)
Index size is - 322 gb

Cluster details -

11 m2.4x.large EC2 boxes [68 gb memory, 1.6 TB storage, 8 cores](1 box is a load balancer node with node.data = false)
50 shards
1 replica

===
elasticsearch.yml -

node.data: true

http.enabled: false

index.number_of_shards: 50

index.number_of_replicas: 1

index.translog.flush_threshold_ops: 10000

index.merge.policy.use_compound_files: false

indices.memory.index_buffer_size: 30%

index.refresh_interval: 30s

index.store.type: mmapfs

path.data: /data-xvdf,/data-xvdg

===

I'm starting the elasticsearch nodes with following command - /home/ec2-user/elasticsearch-0.90.2/bin/elasticsearch -f -Xms30g -Xmx30g

My problem is that I'm running following query on risk item type and it is taking about 10-15 seconds to return data.

I'm running this with a load of 50 concurrent users and a bulk index load of about 5000 risk items happening in parallel.

Query -

http://<load balancer host name>:9200/contractindex/riskitem/_search

{
"query": {
"has_parent": {
"parent_type": "contract",
"query": {
"range": {
"ContractDate": {
"gte": "2010-01-01"
}
}
}
}
},
"filter": {
"and": [{
"query": {
"bool": {
"must": [{
"query_string": {
"fields": ["RiskItemProperty1"],
"query": "abc"
}
},
{
"query_string": {
"fields": ["RiskItemProperty2"],
"query": "xyz"
}
}]
}
}
}]
}
}

Can somebody please help me with how I can improve this query performance ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Elasticsearch query performance

Ivan Brusic
Can you profile the query without the indexing process happening in parallel? The index_buffer_size setting seems high compared to the default and your bulk load should only be just over a MB.

The has_parent query could easily be turned into a filter so that you can take advantage of filtering caching. Is scoring important for that query? I am assuming it is not since it is a range query.

Cheers,

Ivan


On Thu, Aug 15, 2013 at 7:04 PM, VB <[hidden email]> wrote:
I'm using elasticsearch to index two types of objects -

Data details -

Contract object ~ 60 properties (Object size - 120 bytes)
Risk Item Object ~ 125 properties (Object size - 250 bytes)

Contract is parent of risk item (_parent)

I'm storing 240 million such objects in single index (210 million risk items, 30 million contracts)
Index size is - 322 gb

Cluster details -

11 m2.4x.large EC2 boxes [68 gb memory, 1.6 TB storage, 8 cores](1 box is a load balancer node with node.data = false)
50 shards
1 replica

===
elasticsearch.yml -

node.data: true

http.enabled: false

index.number_of_shards: 50

index.number_of_replicas: 1

index.translog.flush_threshold_ops: 10000

index.merge.policy.use_compound_files: false

indices.memory.index_buffer_size: 30%

index.refresh_interval: 30s

index.store.type: mmapfs

path.data: /data-xvdf,/data-xvdg

===

I'm starting the elasticsearch nodes with following command - /home/ec2-user/elasticsearch-0.90.2/bin/elasticsearch -f -Xms30g -Xmx30g

My problem is that I'm running following query on risk item type and it is taking about 10-15 seconds to return data.

I'm running this with a load of 50 concurrent users and a bulk index load of about 5000 risk items happening in parallel.

Query -

http://<load balancer host name>:9200/contractindex/riskitem/_search

{
"query": {
"has_parent": {
"parent_type": "contract",
"query": {
"range": {
"ContractDate": {
"gte": "2010-01-01"
}
}
}
}
},
"filter": {
"and": [{
"query": {
"bool": {
"must": [{
"query_string": {
"fields": ["RiskItemProperty1"],
"query": "abc"
}
},
{
"query_string": {
"fields": ["RiskItemProperty2"],
"query": "xyz"
}
}]
}
}
}]
}
}

Can somebody please help me with how I can improve this query performance ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
VB
Reply | Threaded
Open this post in threaded view
|

Re: Elasticsearch query performance

VB
Ivan,

Thanks for the reply.

We are new to elasticsearch, and yes we did run search queries without indexing and and it still takes around 10 secs.

We can reduce buffer size or remove that setting from yml. Can we remove/change it after indexes are created or we need to create indexes again. Does it need server restart or we can call update setting API?

It would be highly appreciated if you can provide a filter version of the query and scoring is also not important.

Regards,
VB  



On Friday, 16 August 2013 10:28:09 UTC-7, Ivan Brusic wrote:
Can you profile the query without the indexing process happening in parallel? The index_buffer_size setting seems high compared to the default and your bulk load should only be just over a MB.

The has_parent query could easily be turned into a filter so that you can take advantage of filtering caching. Is scoring important for that query? I am assuming it is not since it is a range query.

Cheers,

Ivan


On Thu, Aug 15, 2013 at 7:04 PM, VB <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="ZuYTAicnFcQJ">vishal....@...> wrote:
I'm using elasticsearch to index two types of objects -

Data details -

Contract object ~ 60 properties (Object size - 120 bytes)
Risk Item Object ~ 125 properties (Object size - 250 bytes)

Contract is parent of risk item (_parent)

I'm storing 240 million such objects in single index (210 million risk items, 30 million contracts)
Index size is - 322 gb

Cluster details -

11 m2.4x.large EC2 boxes [68 gb memory, 1.6 TB storage, 8 cores](1 box is a load balancer node with node.data = false)
50 shards
1 replica

===
elasticsearch.yml -

node.data: true

http.enabled: false

index.number_of_shards: 50

index.number_of_replicas: 1

index.translog.flush_threshold_ops: 10000

index.merge.policy.use_compound_files: false

indices.memory.index_buffer_size: 30%

index.refresh_interval: 30s

index.store.type: mmapfs

path.data: /data-xvdf,/data-xvdg

===

I'm starting the elasticsearch nodes with following command - /home/ec2-user/elasticsearch-0.90.2/bin/elasticsearch -f -Xms30g -Xmx30g

My problem is that I'm running following query on risk item type and it is taking about 10-15 seconds to return data.

I'm running this with a load of 50 concurrent users and a bulk index load of about 5000 risk items happening in parallel.

Query -

http://<load balancer host name>:9200/contractindex/riskitem/_search

{
"query": {
"has_parent": {
"parent_type": "contract",
"query": {
"range": {
"ContractDate": {
"gte": "2010-01-01"
}
}
}
}
},
"filter": {
"and": [{
"query": {
"bool": {
"must": [{
"query_string": {
"fields": ["RiskItemProperty1"],
"query": "abc"
}
},
{
"query_string": {
"fields": ["RiskItemProperty2"],
"query": "xyz"
}
}]
}
}
}]
}
}

Can somebody please help me with how I can improve this query performance ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="ZuYTAicnFcQJ">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
VB
Reply | Threaded
Open this post in threaded view
|

Re: Elasticsearch query performance

VB
We have removed buffer size and restarted cluster/nodes. 

Query is still taking around 10 seconds, CPU on all server is maxing out.

And tried looking for documentation to change has_parent/has_child queries to normal filter queries. Could not find anything, any inputs will be useful.

Regards,
VB


On Friday, 16 August 2013 13:50:51 UTC-7, VB wrote:
Ivan,

Thanks for the reply.

We are new to elasticsearch, and yes we did run search queries without indexing and and it still takes around 10 secs.

We can reduce buffer size or remove that setting from yml. Can we remove/change it after indexes are created or we need to create indexes again. Does it need server restart or we can call update setting API?

It would be highly appreciated if you can provide a filter version of the query and scoring is also not important.

Regards,
VB  



On Friday, 16 August 2013 10:28:09 UTC-7, Ivan Brusic wrote:
Can you profile the query without the indexing process happening in parallel? The index_buffer_size setting seems high compared to the default and your bulk load should only be just over a MB.

The has_parent query could easily be turned into a filter so that you can take advantage of filtering caching. Is scoring important for that query? I am assuming it is not since it is a range query.

Cheers,

Ivan


On Thu, Aug 15, 2013 at 7:04 PM, VB <[hidden email]> wrote:
I'm using elasticsearch to index two types of objects -

Data details -

Contract object ~ 60 properties (Object size - 120 bytes)
Risk Item Object ~ 125 properties (Object size - 250 bytes)

Contract is parent of risk item (_parent)

I'm storing 240 million such objects in single index (210 million risk items, 30 million contracts)
Index size is - 322 gb

Cluster details -

11 m2.4x.large EC2 boxes [68 gb memory, 1.6 TB storage, 8 cores](1 box is a load balancer node with node.data = false)
50 shards
1 replica

===
elasticsearch.yml -

node.data: true

http.enabled: false

index.number_of_shards: 50

index.number_of_replicas: 1

index.translog.flush_threshold_ops: 10000

index.merge.policy.use_compound_files: false

indices.memory.index_buffer_size: 30%

index.refresh_interval: 30s

index.store.type: mmapfs

path.data: /data-xvdf,/data-xvdg

===

I'm starting the elasticsearch nodes with following command - /home/ec2-user/elasticsearch-0.90.2/bin/elasticsearch -f -Xms30g -Xmx30g

My problem is that I'm running following query on risk item type and it is taking about 10-15 seconds to return data.

I'm running this with a load of 50 concurrent users and a bulk index load of about 5000 risk items happening in parallel.

Query -

http://<load balancer host name>:9200/contractindex/riskitem/_search

{
"query": {
"has_parent": {
"parent_type": "contract",
"query": {
"range": {
"ContractDate": {
"gte": "2010-01-01"
}
}
}
}
},
"filter": {
"and": [{
"query": {
"bool": {
"must": [{
"query_string": {
"fields": ["RiskItemProperty1"],
"query": "abc"
}
},
{
"query_string": {
"fields": ["RiskItemProperty2"],
"query": "xyz"
}
}]
}
}
}]
}
}

Can somebody please help me with how I can improve this query performance ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Elasticsearch query performance

AlexR
Hi VB,

I do not know your use case but have you considered denormalizing your data? I other words storing parent object as part of its children json. Your query and facet performance wull be much better but the price to pay is having to update every child record if parent changes. Plus if majority of your searches need to return parent you would need to have some way of distincting single parent record out of potentially multiple hits on this parent/child.  Still it may worth it. Your parent record is pretty small so the key is just how often it changes. Maybe bring only some of the parent fields you really need for serching into the child records ?
Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
VB
Reply | Threaded
Open this post in threaded view
|

Re: Elasticsearch query performance

VB
Alex we cannot go with denormalizing data, as you mentioned it would need to update each parent document for any change any attribute of the child document. Is there anything else you can propose. 

In general also our queries from one table are also slower 

This query takes around 8 seconds.

{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"CommonCharacteristic_BuildingScheme": "BuildingScheme1"
}
},
{
"term": {
"Address_Admin2Name": "Admin2Name1"
}
}]
}
}
}
}

This query takes around 6.5 seconds for Top 10 records ( but has sort on top of it)

{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"Insurer": "Insurer1"
}
},
{
"term": {
"Status": "Status1"
}
}]
}
}
}
}

But all our queries are with random values with few random set of data.

On Saturday, 17 August 2013 13:59:56 UTC-7, AlexR wrote:
Hi VB,

I do not know your use case but have you considered denormalizing your data? I other words storing parent object as part of its children json. Your query and facet performance wull be much better but the price to pay is having to update every child record if parent changes. Plus if majority of your searches need to return parent you would need to have some way of distincting single parent record out of potentially multiple hits on this parent/child.  Still it may worth it. Your parent record is pretty small so the key is just how often it changes. Maybe bring only some of the parent fields you really need for serching into the child records ?
Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Elasticsearch query performance

Matt Weber-2
You should use a Bool Filter with must clauses, read this:  http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/

{
    "query": {
        "constant_score": {
            "filter": {
                "bool": {
                    "must": [
                        {"term": {"CommonCharacteristic_BuildingScheme": "BuildingScheme1"}},
                        {"term": {"Address_Admin2Name": "Admin2Name1"}}
                    ]
                }
            }
        }
    }
}

Thanks,
Matt Weber


On Mon, Aug 19, 2013 at 2:23 PM, VB <[hidden email]> wrote:
Alex we cannot go with denormalizing data, as you mentioned it would need to update each parent document for any change any attribute of the child document. Is there anything else you can propose. 

In general also our queries from one table are also slower 

This query takes around 8 seconds.

{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"CommonCharacteristic_BuildingScheme": "BuildingScheme1"
}
},
{
"term": {
"Address_Admin2Name": "Admin2Name1"
}
}]
}
}
}
}

This query takes around 6.5 seconds for Top 10 records ( but has sort on top of it)

{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"Insurer": "Insurer1"
}
},
{
"term": {
"Status": "Status1"
}
}]
}
}
}
}

But all our queries are with random values with few random set of data.

On Saturday, 17 August 2013 13:59:56 UTC-7, AlexR wrote:
Hi VB,

I do not know your use case but have you considered denormalizing your data? I other words storing parent object as part of its children json. Your query and facet performance wull be much better but the price to pay is having to update every child record if parent changes. Plus if majority of your searches need to return parent you would need to have some way of distincting single parent record out of potentially multiple hits on this parent/child.  Still it may worth it. Your parent record is pretty small so the key is just how often it changes. Maybe bring only some of the parent fields you really need for serching into the child records ?
Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
VB
Reply | Threaded
Open this post in threaded view
|

Re: Elasticsearch query performance

VB
Thanks Matt.

Can we use this on our parent child queries? and how to write parent child queries without using has_parent/has_child?

And is there a thumb rule about about when to use bool and when not use it?

Regards,
VB

On Monday, 19 August 2013 14:37:38 UTC-7, Matt Weber wrote:
You should use a Bool Filter with must clauses, read this:  http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/

{
    "query": {
        "constant_score": {
            "filter": {
                "bool": {
                    "must": [
                        {"term": {"CommonCharacteristic_BuildingScheme": "BuildingScheme1"}},
                        {"term": {"Address_Admin2Name": "Admin2Name1"}}
                    ]
                }
            }
        }
    }
}

Thanks,
Matt Weber


On Mon, Aug 19, 2013 at 2:23 PM, VB <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="VH60QkMi_RoJ">vishal....@...> wrote:
Alex we cannot go with denormalizing data, as you mentioned it would need to update each parent document for any change any attribute of the child document. Is there anything else you can propose. 

In general also our queries from one table are also slower 

This query takes around 8 seconds.

{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"CommonCharacteristic_BuildingScheme": "BuildingScheme1"
}
},
{
"term": {
"Address_Admin2Name": "Admin2Name1"
}
}]
}
}
}
}

This query takes around 6.5 seconds for Top 10 records ( but has sort on top of it)

{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"Insurer": "Insurer1"
}
},
{
"term": {
"Status": "Status1"
}
}]
}
}
}
}

But all our queries are with random values with few random set of data.

On Saturday, 17 August 2013 13:59:56 UTC-7, AlexR wrote:
Hi VB,

I do not know your use case but have you considered denormalizing your data? I other words storing parent object as part of its children json. Your query and facet performance wull be much better but the price to pay is having to update every child record if parent changes. Plus if majority of your searches need to return parent you would need to have some way of distincting single parent record out of potentially multiple hits on this parent/child.  Still it may worth it. Your parent record is pretty small so the key is just how often it changes. Maybe bring only some of the parent fields you really need for serching into the child records ?
Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="VH60QkMi_RoJ">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Elasticsearch query performance

AlexR
In reply to this post by VB
I meant the opposite denormalize parent into child. You will not need to update parent on child change but all child on parent change which hopefully will be less frequent. And perhaps you only need some parent fields in the child which would make relevant pare nt changes less frequent.

I wonder if bool filter would make dramatic diff please let us know

Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Elasticsearch query performance

joergprante@gmail.com
In reply to this post by VB
With your ES cluster node config, you tell ES that it should fill 30g of heap for filter/cache. Do you use warming?

Another observation is that your index is 322g across 11 nodes, which makes ~30g per node and you have assigned 64g - 30g = 34g to file system and other so your whole 322g files will fit into the file system cache.

My opinion is that 10s is blazingly fast to fill ~30g from the file system, prepare your filter query in the heap which may use up to another 30g, and execute the query plus delivering results.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Elasticsearch query performance

Matt Weber-2
In reply to this post by VB
Yes, you can and should use bool query/filter with your parent/child queries.  Read the article that I linked to know when and when not to use them.  Looking at your original query, I would probably go with something like this:

{
    "query": {
        "filtered" : {
            "query": {
                "bool": {
                    "must": [
                        {"match": {"RiskItemProperty1": "abc"}},
                        {"match": {"RiskItemProperty2": "xyz"}}
                    ]
                }
            },
            "filter": {
                "has_parent": {
                    "parent_type": "contract",
                    "filter": {
                        "range": {
                            "ContractDate": {
                                "gte": "2010-01-01"
                            }
                        }
                    }
                }
            }
        }
    }
}

Remember that your first couple has_parent or has_child filters and queries are going to be slower due to id cache being loaded into memory.

Thanks,
Matt Weber




On Mon, Aug 19, 2013 at 2:46 PM, VB <[hidden email]> wrote:
Thanks Matt.

Can we use this on our parent child queries? and how to write parent child queries without using has_parent/has_child?

And is there a thumb rule about about when to use bool and when not use it?

Regards,
VB

On Monday, 19 August 2013 14:37:38 UTC-7, Matt Weber wrote:
You should use a Bool Filter with must clauses, read this:  http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/

{
    "query": {
        "constant_score": {
            "filter": {
                "bool": {
                    "must": [
                        {"term": {"CommonCharacteristic_BuildingScheme": "BuildingScheme1"}},
                        {"term": {"Address_Admin2Name": "Admin2Name1"}}
                    ]
                }
            }
        }
    }
}

Thanks,
Matt Weber


On Mon, Aug 19, 2013 at 2:23 PM, VB <[hidden email]> wrote:
Alex we cannot go with denormalizing data, as you mentioned it would need to update each parent document for any change any attribute of the child document. Is there anything else you can propose. 

In general also our queries from one table are also slower 

This query takes around 8 seconds.

{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"CommonCharacteristic_BuildingScheme": "BuildingScheme1"
}
},
{
"term": {
"Address_Admin2Name": "Admin2Name1"
}
}]
}
}
}
}

This query takes around 6.5 seconds for Top 10 records ( but has sort on top of it)

{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"Insurer": "Insurer1"
}
},
{
"term": {
"Status": "Status1"
}
}]
}
}
}
}

But all our queries are with random values with few random set of data.

On Saturday, 17 August 2013 13:59:56 UTC-7, AlexR wrote:
Hi VB,

I do not know your use case but have you considered denormalizing your data? I other words storing parent object as part of its children json. Your query and facet performance wull be much better but the price to pay is having to update every child record if parent changes. Plus if majority of your searches need to return parent you would need to have some way of distincting single parent record out of potentially multiple hits on this parent/child.  Still it may worth it. Your parent record is pretty small so the key is just how often it changes. Maybe bring only some of the parent fields you really need for serching into the child records ?
Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
VB
Reply | Threaded
Open this post in threaded view
|

Re: Elasticsearch query performance

VB
Thanks Matt. I will run thorough this and post my observation.

On Monday, 19 August 2013 15:00:42 UTC-7, Matt Weber wrote:
Yes, you can and should use bool query/filter with your parent/child queries.  Read the article that I linked to know when and when not to use them.  Looking at your original query, I would probably go with something like this:

{
    "query": {
        "filtered" : {
            "query": {
                "bool": {
                    "must": [
                        {"match": {"RiskItemProperty1": "abc"}},
                        {"match": {"RiskItemProperty2": "xyz"}}
                    ]
                }
            },
            "filter": {
                "has_parent": {
                    "parent_type": "contract",
                    "filter": {
                        "range": {
                            "ContractDate": {
                                "gte": "2010-01-01"
                            }
                        }
                    }
                }
            }
        }
    }
}

Remember that your first couple has_parent or has_child filters and queries are going to be slower due to id cache being loaded into memory.

Thanks,
Matt Weber




On Mon, Aug 19, 2013 at 2:46 PM, VB <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="43RTaDvrbpsJ">vishal....@...> wrote:
Thanks Matt.

Can we use this on our parent child queries? and how to write parent child queries without using has_parent/has_child?

And is there a thumb rule about about when to use bool and when not use it?

Regards,
VB

On Monday, 19 August 2013 14:37:38 UTC-7, Matt Weber wrote:
You should use a Bool Filter with must clauses, read this:  http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/

{
    "query": {
        "constant_score": {
            "filter": {
                "bool": {
                    "must": [
                        {"term": {"CommonCharacteristic_BuildingScheme": "BuildingScheme1"}},
                        {"term": {"Address_Admin2Name": "Admin2Name1"}}
                    ]
                }
            }
        }
    }
}

Thanks,
Matt Weber


On Mon, Aug 19, 2013 at 2:23 PM, VB <[hidden email]> wrote:
Alex we cannot go with denormalizing data, as you mentioned it would need to update each parent document for any change any attribute of the child document. Is there anything else you can propose. 

In general also our queries from one table are also slower 

This query takes around 8 seconds.

{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"CommonCharacteristic_BuildingScheme": "BuildingScheme1"
}
},
{
"term": {
"Address_Admin2Name": "Admin2Name1"
}
}]
}
}
}
}

This query takes around 6.5 seconds for Top 10 records ( but has sort on top of it)

{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"Insurer": "Insurer1"
}
},
{
"term": {
"Status": "Status1"
}
}]
}
}
}
}

But all our queries are with random values with few random set of data.

On Saturday, 17 August 2013 13:59:56 UTC-7, AlexR wrote:
Hi VB,

I do not know your use case but have you considered denormalizing your data? I other words storing parent object as part of its children json. Your query and facet performance wull be much better but the price to pay is having to update every child record if parent changes. Plus if majority of your searches need to return parent you would need to have some way of distincting single parent record out of potentially multiple hits on this parent/child.  Still it may worth it. Your parent record is pretty small so the key is just how often it changes. Maybe bring only some of the parent fields you really need for serching into the child records ?
Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="43RTaDvrbpsJ">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
VB
Reply | Threaded
Open this post in threaded view
|

Re: Elasticsearch query performance

VB
Hi all,

We made changes as suggested by Matt to use bitsets. 

We ran 50 concurrent users (Read Only) for an hour. All our queries are performing 4 to 5 times faster, except parent child query (query in question) it has gone down from 7 seconds to 3 seconds.

Matt, thank you so much fort helping us. Is there anything else we can do in parent child one or in general.

I have one more query with has_child in it. Do you think we can further improve this one?


{
"query": {
"filtered": {
"query": {
"bool": {
"must": [{
"match": {
"LineOfBusiness": "LOBValue1"
}
}]
}
},
"filter": {
"has_child": {
"type": "riskitem",
"filter": {
"bool": {
"must": [{
"term": {
"Address_Admin1Name": "Admin1Name1"
}
}]
}
}
}
}
}
}
}

Regards,
VB.

On Monday, 19 August 2013 15:04:58 UTC-7, VB wrote:
Thanks Matt. I will run thorough this and post my observation.

On Monday, 19 August 2013 15:00:42 UTC-7, Matt Weber wrote:
Yes, you can and should use bool query/filter with your parent/child queries.  Read the article that I linked to know when and when not to use them.  Looking at your original query, I would probably go with something like this:

{
    "query": {
        "filtered" : {
            "query": {
                "bool": {
                    "must": [
                        {"match": {"RiskItemProperty1": "abc"}},
                        {"match": {"RiskItemProperty2": "xyz"}}
                    ]
                }
            },
            "filter": {
                "has_parent": {
                    "parent_type": "contract",
                    "filter": {
                        "range": {
                            "ContractDate": {
                                "gte": "2010-01-01"
                            }
                        }
                    }
                }
            }
        }
    }
}

Remember that your first couple has_parent or has_child filters and queries are going to be slower due to id cache being loaded into memory.

Thanks,
Matt Weber




On Mon, Aug 19, 2013 at 2:46 PM, VB <[hidden email]> wrote:
Thanks Matt.

Can we use this on our parent child queries? and how to write parent child queries without using has_parent/has_child?

And is there a thumb rule about about when to use bool and when not use it?

Regards,
VB

On Monday, 19 August 2013 14:37:38 UTC-7, Matt Weber wrote:
You should use a Bool Filter with must clauses, read this:  http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/

{
    "query": {
        "constant_score": {
            "filter": {
                "bool": {
                    "must": [
                        {"term": {"CommonCharacteristic_BuildingScheme": "BuildingScheme1"}},
                        {"term": {"Address_Admin2Name": "Admin2Name1"}}
                    ]
                }
            }
        }
    }
}

Thanks,
Matt Weber


On Mon, Aug 19, 2013 at 2:23 PM, VB <[hidden email]> wrote:
Alex we cannot go with denormalizing data, as you mentioned it would need to update each parent document for any change any attribute of the child document. Is there anything else you can propose. 

In general also our queries from one table are also slower 

This query takes around 8 seconds.

{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"CommonCharacteristic_BuildingScheme": "BuildingScheme1"
}
},
{
"term": {
"Address_Admin2Name": "Admin2Name1"
}
}]
}
}
}
}

This query takes around 6.5 seconds for Top 10 records ( but has sort on top of it)

{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"Insurer": "Insurer1"
}
},
{
"term": {
"Status": "Status1"
}
}]
}
}
}
}

But all our queries are with random values with few random set of data.

On Saturday, 17 August 2013 13:59:56 UTC-7, AlexR wrote:
Hi VB,

I do not know your use case but have you considered denormalizing your data? I other words storing parent object as part of its children json. Your query and facet performance wull be much better but the price to pay is having to update every child record if parent changes. Plus if majority of your searches need to return parent you would need to have some way of distincting single parent record out of potentially multiple hits on this parent/child.  Still it may worth it. Your parent record is pretty small so the key is just how often it changes. Maybe bring only some of the parent fields you really need for serching into the child records ?
Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
VB
Reply | Threaded
Open this post in threaded view
|

Re: Elasticsearch query performance

VB
And one more of this type which needs improvement.

{
"query": {
"bool": {
"must": [{
"range": {
"InceptionDate": {
"gt": "2009-01-01"
}
}
},
{
"range": {
"ExpirationDate": {
"lt": "2013-01-01"
}
}
},
{
"has_child": {
"type": "riskitem",
"query": {
"filtered": {
"filter": {
"or": [{
"bool": {
"must": [{
"term": {
"Address_Admin2Name": "tureni"
}
},
{
"term": {
"Address_Admin2Name_US": "burlington"
}
},
{
"term": {
"CommonCharacteristic_BuildingClass": "62"
}
}]
}
},
{
"bool": {
"must": [{
"term": {
"CommonCharacteristic_ConstructionName": "heavy"
}
},
{
"term": {
"CommonCharacteristic_BuildingScheme": "rms"
}
},
{
"terms": {
"CommonCharacteristic_ValuationType": ["reported",
"reported"]
}
}]
}
}]
}
}
}
}
}]
}
}
}

On Tuesday, 20 August 2013 10:24:48 UTC-7, VB wrote:
Hi all,

We made changes as suggested by Matt to use bitsets. 

We ran 50 concurrent users (Read Only) for an hour. All our queries are performing 4 to 5 times faster, except parent child query (query in question) it has gone down from 7 seconds to 3 seconds.

Matt, thank you so much fort helping us. Is there anything else we can do in parent child one or in general.

I have one more query with has_child in it. Do you think we can further improve this one?


{
"query": {
"filtered": {
"query": {
"bool": {
"must": [{
"match": {
"LineOfBusiness": "LOBValue1"
}
}]
}
},
"filter": {
"has_child": {
"type": "riskitem",
"filter": {
"bool": {
"must": [{
"term": {
"Address_Admin1Name": "Admin1Name1"
}
}]
}
}
}
}
}
}
}

Regards,
VB.

On Monday, 19 August 2013 15:04:58 UTC-7, VB wrote:
Thanks Matt. I will run thorough this and post my observation.

On Monday, 19 August 2013 15:00:42 UTC-7, Matt Weber wrote:
Yes, you can and should use bool query/filter with your parent/child queries.  Read the article that I linked to know when and when not to use them.  Looking at your original query, I would probably go with something like this:

{
    "query": {
        "filtered" : {
            "query": {
                "bool": {
                    "must": [
                        {"match": {"RiskItemProperty1": "abc"}},
                        {"match": {"RiskItemProperty2": "xyz"}}
                    ]
                }
            },
            "filter": {
                "has_parent": {
                    "parent_type": "contract",
                    "filter": {
                        "range": {
                            "ContractDate": {
                                "gte": "2010-01-01"
                            }
                        }
                    }
                }
            }
        }
    }
}

Remember that your first couple has_parent or has_child filters and queries are going to be slower due to id cache being loaded into memory.

Thanks,
Matt Weber




On Mon, Aug 19, 2013 at 2:46 PM, VB <[hidden email]> wrote:
Thanks Matt.

Can we use this on our parent child queries? and how to write parent child queries without using has_parent/has_child?

And is there a thumb rule about about when to use bool and when not use it?

Regards,
VB

On Monday, 19 August 2013 14:37:38 UTC-7, Matt Weber wrote:
You should use a Bool Filter with must clauses, read this:  http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/

{
    "query": {
        "constant_score": {
            "filter": {
                "bool": {
                    "must": [
                        {"term": {"CommonCharacteristic_BuildingScheme": "BuildingScheme1"}},
                        {"term": {"Address_Admin2Name": "Admin2Name1"}}
                    ]
                }
            }
        }
    }
}

Thanks,
Matt Weber


On Mon, Aug 19, 2013 at 2:23 PM, VB <[hidden email]> wrote:
Alex we cannot go with denormalizing data, as you mentioned it would need to update each parent document for any change any attribute of the child document. Is there anything else you can propose. 

In general also our queries from one table are also slower 

This query takes around 8 seconds.

{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"CommonCharacteristic_BuildingScheme": "BuildingScheme1"
}
},
{
"term": {
"Address_Admin2Name": "Admin2Name1"
}
}]
}
}
}
}

This query takes around 6.5 seconds for Top 10 records ( but has sort on top of it)

{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"Insurer": "Insurer1"
}
},
{
"term": {
"Status": "Status1"
}
}]
}
}
}
}

But all our queries are with random values with few random set of data.

On Saturday, 17 August 2013 13:59:56 UTC-7, AlexR wrote:
Hi VB,

I do not know your use case but have you considered denormalizing your data? I other words storing parent object as part of its children json. Your query and facet performance wull be much better but the price to pay is having to update every child record if parent changes. Plus if majority of your searches need to return parent you would need to have some way of distincting single parent record out of potentially multiple hits on this parent/child.  Still it may worth it. Your parent record is pretty small so the key is just how often it changes. Maybe bring only some of the parent fields you really need for serching into the child records ?
Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
VB
Reply | Threaded
Open this post in threaded view
|

Re: Elasticsearch query performance

VB
Can anyone please comment/help?

On Tuesday, 20 August 2013 11:33:39 UTC-7, VB wrote:
And one more of this type which needs improvement.

{
"query": {
"bool": {
"must": [{
"range": {
"InceptionDate": {
"gt": "2009-01-01"
}
}
},
{
"range": {
"ExpirationDate": {
"lt": "2013-01-01"
}
}
},
{
"has_child": {
"type": "riskitem",
"query": {
"filtered": {
"filter": {
"or": [{
"bool": {
"must": [{
"term": {
"Address_Admin2Name": "tureni"
}
},
{
"term": {
"Address_Admin2Name_US": "burlington"
}
},
{
"term": {
"CommonCharacteristic_BuildingClass": "62"
}
}]
}
},
{
"bool": {
"must": [{
"term": {
"CommonCharacteristic_ConstructionName": "heavy"
}
},
{
"term": {
"CommonCharacteristic_BuildingScheme": "rms"
}
},
{
"terms": {
"CommonCharacteristic_ValuationType": ["reported",
"reported"]
}
}]
}
}]
}
}
}
}
}]
}
}
}

On Tuesday, 20 August 2013 10:24:48 UTC-7, VB wrote:
Hi all,

We made changes as suggested by Matt to use bitsets. 

We ran 50 concurrent users (Read Only) for an hour. All our queries are performing 4 to 5 times faster, except parent child query (query in question) it has gone down from 7 seconds to 3 seconds.

Matt, thank you so much fort helping us. Is there anything else we can do in parent child one or in general.

I have one more query with has_child in it. Do you think we can further improve this one?


{
"query": {
"filtered": {
"query": {
"bool": {
"must": [{
"match": {
"LineOfBusiness": "LOBValue1"
}
}]
}
},
"filter": {
"has_child": {
"type": "riskitem",
"filter": {
"bool": {
"must": [{
"term": {
"Address_Admin1Name": "Admin1Name1"
}
}]
}
}
}
}
}
}
}

Regards,
VB.

On Monday, 19 August 2013 15:04:58 UTC-7, VB wrote:
Thanks Matt. I will run thorough this and post my observation.

On Monday, 19 August 2013 15:00:42 UTC-7, Matt Weber wrote:
Yes, you can and should use bool query/filter with your parent/child queries.  Read the article that I linked to know when and when not to use them.  Looking at your original query, I would probably go with something like this:

{
    "query": {
        "filtered" : {
            "query": {
                "bool": {
                    "must": [
                        {"match": {"RiskItemProperty1": "abc"}},
                        {"match": {"RiskItemProperty2": "xyz"}}
                    ]
                }
            },
            "filter": {
                "has_parent": {
                    "parent_type": "contract",
                    "filter": {
                        "range": {
                            "ContractDate": {
                                "gte": "2010-01-01"
                            }
                        }
                    }
                }
            }
        }
    }
}

Remember that your first couple has_parent or has_child filters and queries are going to be slower due to id cache being loaded into memory.

Thanks,
Matt Weber




On Mon, Aug 19, 2013 at 2:46 PM, VB <[hidden email]> wrote:
Thanks Matt.

Can we use this on our parent child queries? and how to write parent child queries without using has_parent/has_child?

And is there a thumb rule about about when to use bool and when not use it?

Regards,
VB

On Monday, 19 August 2013 14:37:38 UTC-7, Matt Weber wrote:
You should use a Bool Filter with must clauses, read this:  http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/

{
    "query": {
        "constant_score": {
            "filter": {
                "bool": {
                    "must": [
                        {"term": {"CommonCharacteristic_BuildingScheme": "BuildingScheme1"}},
                        {"term": {"Address_Admin2Name": "Admin2Name1"}}
                    ]
                }
            }
        }
    }
}

Thanks,
Matt Weber


On Mon, Aug 19, 2013 at 2:23 PM, VB <[hidden email]> wrote:
Alex we cannot go with denormalizing data, as you mentioned it would need to update each parent document for any change any attribute of the child document. Is there anything else you can propose. 

In general also our queries from one table are also slower 

This query takes around 8 seconds.

{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"CommonCharacteristic_BuildingScheme": "BuildingScheme1"
}
},
{
"term": {
"Address_Admin2Name": "Admin2Name1"
}
}]
}
}
}
}

This query takes around 6.5 seconds for Top 10 records ( but has sort on top of it)

{
"query": {
"constant_score": {
"filter": {
"and": [{
"term": {
"Insurer": "Insurer1"
}
},
{
"term": {
"Status": "Status1"
}
}]
}
}
}
}

But all our queries are with random values with few random set of data.

On Saturday, 17 August 2013 13:59:56 UTC-7, AlexR wrote:
Hi VB,

I do not know your use case but have you considered denormalizing your data? I other words storing parent object as part of its children json. Your query and facet performance wull be much better but the price to pay is having to update every child record if parent changes. Plus if majority of your searches need to return parent you would need to have some way of distincting single parent record out of potentially multiple hits on this parent/child.  Still it may worth it. Your parent record is pretty small so the key is just how often it changes. Maybe bring only some of the parent fields you really need for serching into the child records ?
Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.