Peformance issues with has_parent filters

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Peformance issues with has_parent filters

Xiaolin Xie

Hi Elastic Search developers

I am new to ES. We had some performance issues with our Elastic Search system, and we would like to get some ideas/thoughts about this issue from your guys.

Here is our use case: we have three types of documents in one index: “campaign_group”, “campaign”, and “ad”. “campaign_group” is the parent of “campaign”, and “campaign” is the parent of “ad”.  Each document type has about 10 simple properties, such as string, long, short. The three kinds of documents all have a property “user”(long) and a property “run_status”(short). Documents are hashed by “user”, documents with the same “user” are mapped into the same shard.

We have about 1.4 billion documents in total. We have 200 shards, 3 master node, and 21 data nodes, and each shard has too replica.  The total data size is 1.5TB. We are running elasticsearch 1.21.

Queries are made against specific shard by routing. The flowing query(1) checks the run_status of “ads”(run_status is a short type), and it takes about 100 milliseconds. The query(2) checks both the run_status of “ad”, and the run_status of its parent, and it takes about 2000 milliseconds.  It looks like there are some performance issues with the has_parent filter.

Do your guys have any thoughts about this problem? Is it expected(because ES cannot support has_parent well)? Or something else cloud result this problem? Or we should upgrade our Elastic Search version?

Please let me know if you need any other information about our uses cases. 

Any thoughts/ideas will be highly appreciated.

========================Query(1) ========================

{

  "filter":{

    "and":[

      {

        "term":{

          "user":1436594776581528

        }

      },

      {

        "terms":{

          "run_status":[

            1

          ]

        }

      }

    ]

  },

  "sort":{

    "_uid":"desc"

  },

  "size":1000000,

  "from":0

}

 

===========================Query(2)====================

{

  "filter":{

    "and":[

      {

        "term":{

          "user":1436594776581528

        }

      },

      {

        "terms":{

          "run_status":[

            1

          ]

        }

      },

      {

          "has_parent" : {

              "parent_type": "campaign",

              "filter" : {

                  "terms" : {

                      "run_status" : [1]

                  }

              }

          }

      }

    ]

  },

  "sort":{

    "_uid":"desc"

  },

  "size":1000000,

  "from":0

}

 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/220b1d9a-da80-416c-8b8d-d7cc3efc8b5a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Peformance issues with has_parent filters

dadoonet
What happen if you change size to 10?

David

Le 10 déc. 2014 à 03:53, Xiaolin Xie <[hidden email]> a écrit :

Hi Elastic Search developers

I am new to ES. We had some performance issues with our Elastic Search system, and we would like to get some ideas/thoughts about this issue from your guys.

Here is our use case: we have three types of documents in one index: “campaign_group”, “campaign”, and “ad”. “campaign_group” is the parent of “campaign”, and “campaign” is the parent of “ad”.  Each document type has about 10 simple properties, such as string, long, short. The three kinds of documents all have a property “user”(long) and a property “run_status”(short). Documents are hashed by “user”, documents with the same “user” are mapped into the same shard.

We have about 1.4 billion documents in total. We have 200 shards, 3 master node, and 21 data nodes, and each shard has too replica.  The total data size is 1.5TB. We are running elasticsearch 1.21.

Queries are made against specific shard by routing. The flowing query(1) checks the run_status of “ads”(run_status is a short type), and it takes about 100 milliseconds. The query(2) checks both the run_status of “ad”, and the run_status of its parent, and it takes about 2000 milliseconds.  It looks like there are some performance issues with the has_parent filter.

Do your guys have any thoughts about this problem? Is it expected(because ES cannot support has_parent well)? Or something else cloud result this problem? Or we should upgrade our Elastic Search version?

Please let me know if you need any other information about our uses cases. 

Any thoughts/ideas will be highly appreciated.

========================Query(1) ========================

{

  "filter":{

    "and":[

      {

        "term":{

          "user":1436594776581528

        }

      },

      {

        "terms":{

          "run_status":[

            1

          ]

        }

      }

    ]

  },

  "sort":{

    "_uid":"desc"

  },

  "size":1000000,

  "from":0

}

 

===========================Query(2)====================

{

  "filter":{

    "and":[

      {

        "term":{

          "user":1436594776581528

        }

      },

      {

        "terms":{

          "run_status":[

            1

          ]

        }

      },

      {

          "has_parent" : {

              "parent_type": "campaign",

              "filter" : {

                  "terms" : {

                      "run_status" : [1]

                  }

              }

          }

      }

    ]

  },

  "sort":{

    "_uid":"desc"

  },

  "size":1000000,

  "from":0

}

 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/220b1d9a-da80-416c-8b8d-d7cc3efc8b5a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/B9C2DFF9-2368-4D0E-B09A-96D6A7EFBB78%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Peformance issues with has_parent filters

Xiaolin Xie
Hi David

Both queries return only 3 records. If I change the size to 10, the time that the two queries take does not change . The first query still takes about 100 milliseconds, and the second one still takes about 2000 milliseconds. Thanks a lot!

Xiaolin.


On Tuesday, December 9, 2014 10:53:51 PM UTC-8, David Pilato wrote:
What happen if you change size to 10?

David

Le 10 déc. 2014 à 03:53, Xiaolin Xie <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="9z2EFsHGRY0J" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">alph...@...> a écrit :

Hi Elastic Search developers

I am new to ES. We had some performance issues with our Elastic Search system, and we would like to get some ideas/thoughts about this issue from your guys.

Here is our use case: we have three types of documents in one index: “campaign_group”, “campaign”, and “ad”. “campaign_group” is the parent of “campaign”, and “campaign” is the parent of “ad”.  Each document type has about 10 simple properties, such as string, long, short. The three kinds of documents all have a property “user”(long) and a property “run_status”(short). Documents are hashed by “user”, documents with the same “user” are mapped into the same shard.

We have about 1.4 billion documents in total. We have 200 shards, 3 master node, and 21 data nodes, and each shard has too replica.  The total data size is 1.5TB. We are running elasticsearch 1.21.

Queries are made against specific shard by routing. The flowing query(1) checks the run_status of “ads”(run_status is a short type), and it takes about 100 milliseconds. The query(2) checks both the run_status of “ad”, and the run_status of its parent, and it takes about 2000 milliseconds.  It looks like there are some performance issues with the has_parent filter.

Do your guys have any thoughts about this problem? Is it expected(because ES cannot support has_parent well)? Or something else cloud result this problem? Or we should upgrade our Elastic Search version?

Please let me know if you need any other information about our uses cases. 

Any thoughts/ideas will be highly appreciated.

========================Query(1) ========================

{

  "filter":{

    "and":[

      {

        "term":{

          "user":1436594776581528

        }

      },

      {

        "terms":{

          "run_status":[

            1

          ]

        }

      }

    ]

  },

  "sort":{

    "_uid":"desc"

  },

  "size":1000000,

  "from":0

}

 

===========================Query(2)====================

{

  "filter":{

    "and":[

      {

        "term":{

          "user":1436594776581528

        }

      },

      {

        "terms":{

          "run_status":[

            1

          ]

        }

      },

      {

          "has_parent" : {

              "parent_type": "campaign",

              "filter" : {

                  "terms" : {

                      "run_status" : [1]

                  }

              }

          }

      }

    ]

  },

  "sort":{

    "_uid":"desc"

  },

  "size":1000000,

  "from":0

}

 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="9z2EFsHGRY0J" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/220b1d9a-da80-416c-8b8d-d7cc3efc8b5a%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/220b1d9a-da80-416c-8b8d-d7cc3efc8b5a%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/220b1d9a-da80-416c-8b8d-d7cc3efc8b5a%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/220b1d9a-da80-416c-8b8d-d7cc3efc8b5a%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/66a2dabb-4508-4c91-b278-3f4aa3d87212%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Peformance issues with has_parent filters

Les Barstow
In reply to this post by Xiaolin Xie
There is in fact a performance difference between has_parent and other filters, as well as a difference in memory/cache use - especially in earlier versions of ES. This is due to the way in which ES has to query the parent/child relationship.

I do believe that there are some significant performance improvements to parent/child documents in 1.3.0+ - check the release notes. Also, I believe there might have been some tuning and monitoring additions in the newer versions that might help you. (I'm a user of our cluster, not so much an administrator, so I'm not so sure on the latter...)

--
Les Barstow, Senior Software Engineer
Return Path, Inc.

On Tue, Dec 9, 2014 at 7:53 PM, Xiaolin Xie <[hidden email]> wrote:

Hi Elastic Search developers

I am new to ES. We had some performance issues with our Elastic Search system, and we would like to get some ideas/thoughts about this issue from your guys.

Here is our use case: we have three types of documents in one index: “campaign_group”, “campaign”, and “ad”. “campaign_group” is the parent of “campaign”, and “campaign” is the parent of “ad”.  Each document type has about 10 simple properties, such as string, long, short. The three kinds of documents all have a property “user”(long) and a property “run_status”(short). Documents are hashed by “user”, documents with the same “user” are mapped into the same shard.

We have about 1.4 billion documents in total. We have 200 shards, 3 master node, and 21 data nodes, and each shard has too replica.  The total data size is 1.5TB. We are running elasticsearch 1.21.

Queries are made against specific shard by routing. The flowing query(1) checks the run_status of “ads”(run_status is a short type), and it takes about 100 milliseconds. The query(2) checks both the run_status of “ad”, and the run_status of its parent, and it takes about 2000 milliseconds.  It looks like there are some performance issues with the has_parent filter.

Do your guys have any thoughts about this problem? Is it expected(because ES cannot support has_parent well)? Or something else cloud result this problem? Or we should upgrade our Elastic Search version?

Please let me know if you need any other information about our uses cases. 

Any thoughts/ideas will be highly appreciated.

========================Query(1) ========================

{

  "filter":{

    "and":[

      {

        "term":{

          "user":1436594776581528

        }

      },

      {

        "terms":{

          "run_status":[

            1

          ]

        }

      }

    ]

  },

  "sort":{

    "_uid":"desc"

  },

  "size":1000000,

  "from":0

}

 

===========================Query(2)====================

{

  "filter":{

    "and":[

      {

        "term":{

          "user":1436594776581528

        }

      },

      {

        "terms":{

          "run_status":[

            1

          ]

        }

      },

      {

          "has_parent" : {

              "parent_type": "campaign",

              "filter" : {

                  "terms" : {

                      "run_status" : [1]

                  }

              }

          }

      }

    ]

  },

  "sort":{

    "_uid":"desc"

  },

  "size":1000000,

  "from":0

}

 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/220b1d9a-da80-416c-8b8d-d7cc3efc8b5a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAOppbCVrYWBi1EWbuNi0WphqUyxkhmP%2BTiRsk_yb5eFBt7UVLg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Peformance issues with has_parent filters

Ron Sher
In reply to this post by Xiaolin Xie
We had poor experience with has_parent queries and we ended up implementing it ourselves using 2 steps:
  1. Filter the parent documents to get a list of IDs.
  2. Filter the child documents and look only for IDs in the list of IDs from 1

On Wednesday, December 10, 2014 7:48:55 PM UTC+2, Xiaolin Xie wrote:
Hi David

Both queries return only 3 records. If I change the size to 10, the time that the two queries take does not change . The first query still takes about 100 milliseconds, and the second one still takes about 2000 milliseconds. Thanks a lot!

Xiaolin.


On Tuesday, December 9, 2014 10:53:51 PM UTC-8, David Pilato wrote:
What happen if you change size to 10?

David

Le 10 déc. 2014 à 03:53, Xiaolin Xie <[hidden email]> a écrit :

Hi Elastic Search developers

I am new to ES. We had some performance issues with our Elastic Search system, and we would like to get some ideas/thoughts about this issue from your guys.

Here is our use case: we have three types of documents in one index: “campaign_group”, “campaign”, and “ad”. “campaign_group” is the parent of “campaign”, and “campaign” is the parent of “ad”.  Each document type has about 10 simple properties, such as string, long, short. The three kinds of documents all have a property “user”(long) and a property “run_status”(short). Documents are hashed by “user”, documents with the same “user” are mapped into the same shard.

We have about 1.4 billion documents in total. We have 200 shards, 3 master node, and 21 data nodes, and each shard has too replica.  The total data size is 1.5TB. We are running elasticsearch 1.21.

Queries are made against specific shard by routing. The flowing query(1) checks the run_status of “ads”(run_status is a short type), and it takes about 100 milliseconds. The query(2) checks both the run_status of “ad”, and the run_status of its parent, and it takes about 2000 milliseconds.  It looks like there are some performance issues with the has_parent filter.

Do your guys have any thoughts about this problem? Is it expected(because ES cannot support has_parent well)? Or something else cloud result this problem? Or we should upgrade our Elastic Search version?

Please let me know if you need any other information about our uses cases. 

Any thoughts/ideas will be highly appreciated.

========================Query(1) ========================

{

  "filter":{

    "and":[

      {

        "term":{

          "user":1436594776581528

        }

      },

      {

        "terms":{

          "run_status":[

            1

          ]

        }

      }

    ]

  },

  "sort":{

    "_uid":"desc"

  },

  "size":1000000,

  "from":0

}

 

===========================Query(2)====================

{

  "filter":{

    "and":[

      {

        "term":{

          "user":1436594776581528

        }

      },

      {

        "terms":{

          "run_status":[

            1

          ]

        }

      },

      {

          "has_parent" : {

              "parent_type": "campaign",

              "filter" : {

                  "terms" : {

                      "run_status" : [1]

                  }

              }

          }

      }

    ]

  },

  "sort":{

    "_uid":"desc"

  },

  "size":1000000,

  "from":0

}

 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/220b1d9a-da80-416c-8b8d-d7cc3efc8b5a%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/220b1d9a-da80-416c-8b8d-d7cc3efc8b5a%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/220b1d9a-da80-416c-8b8d-d7cc3efc8b5a%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/220b1d9a-da80-416c-8b8d-d7cc3efc8b5a%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c534701d-cd81-4d3c-8150-e3b797cf941a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Peformance issues with has_parent filters

Xiaolin Xie
Hi Ron.

Thanks a lot for the information. we are considering this plan for our case. 

Xiaolin.

On Wednesday, December 10, 2014 10:12:31 AM UTC-8, Ron Sher wrote:
We had poor experience with has_parent queries and we ended up implementing it ourselves using 2 steps:
  1. Filter the parent documents to get a list of IDs.
  2. Filter the child documents and look only for IDs in the list of IDs from 1

On Wednesday, December 10, 2014 7:48:55 PM UTC+2, Xiaolin Xie wrote:
Hi David

Both queries return only 3 records. If I change the size to 10, the time that the two queries take does not change . The first query still takes about 100 milliseconds, and the second one still takes about 2000 milliseconds. Thanks a lot!

Xiaolin.


On Tuesday, December 9, 2014 10:53:51 PM UTC-8, David Pilato wrote:
What happen if you change size to 10?

David

Le 10 déc. 2014 à 03:53, Xiaolin Xie <[hidden email]> a écrit :

Hi Elastic Search developers

I am new to ES. We had some performance issues with our Elastic Search system, and we would like to get some ideas/thoughts about this issue from your guys.

Here is our use case: we have three types of documents in one index: “campaign_group”, “campaign”, and “ad”. “campaign_group” is the parent of “campaign”, and “campaign” is the parent of “ad”.  Each document type has about 10 simple properties, such as string, long, short. The three kinds of documents all have a property “user”(long) and a property “run_status”(short). Documents are hashed by “user”, documents with the same “user” are mapped into the same shard.

We have about 1.4 billion documents in total. We have 200 shards, 3 master node, and 21 data nodes, and each shard has too replica.  The total data size is 1.5TB. We are running elasticsearch 1.21.

Queries are made against specific shard by routing. The flowing query(1) checks the run_status of “ads”(run_status is a short type), and it takes about 100 milliseconds. The query(2) checks both the run_status of “ad”, and the run_status of its parent, and it takes about 2000 milliseconds.  It looks like there are some performance issues with the has_parent filter.

Do your guys have any thoughts about this problem? Is it expected(because ES cannot support has_parent well)? Or something else cloud result this problem? Or we should upgrade our Elastic Search version?

Please let me know if you need any other information about our uses cases. 

Any thoughts/ideas will be highly appreciated.

========================Query(1) ========================

{

  "filter":{

    "and":[

      {

        "term":{

          "user":1436594776581528

        }

      },

      {

        "terms":{

          "run_status":[

            1

          ]

        }

      }

    ]

  },

  "sort":{

    "_uid":"desc"

  },

  "size":1000000,

  "from":0

}

 

===========================Query(2)====================

{

  "filter":{

    "and":[

      {

        "term":{

          "user":1436594776581528

        }

      },

      {

        "terms":{

          "run_status":[

            1

          ]

        }

      },

      {

          "has_parent" : {

              "parent_type": "campaign",

              "filter" : {

                  "terms" : {

                      "run_status" : [1]

                  }

              }

          }

      }

    ]

  },

  "sort":{

    "_uid":"desc"

  },

  "size":1000000,

  "from":0

}

 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/220b1d9a-da80-416c-8b8d-d7cc3efc8b5a%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/220b1d9a-da80-416c-8b8d-d7cc3efc8b5a%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/220b1d9a-da80-416c-8b8d-d7cc3efc8b5a%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/220b1d9a-da80-416c-8b8d-d7cc3efc8b5a%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5364fed6-e9c7-4c3b-8799-4c788ed455db%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Peformance issues with has_parent filters

Xiaolin Xie
In reply to this post by Les Barstow
Hi Les

From the release notes of 1.3.0(http://www.elasticsearch.org/downloads/1-3-0/), It does not mention about the performance improvements to parent/child documents queries. I did not find it 1.4.0(http://www.elasticsearch.org/downloads/1-4-0/) either. How did you find that there are significant performance improvements to parent/child queries? What kind of improvements has it done? and how significant the improvement is? 

Thanks a lot for the help.

Xiaolin.

On Wednesday, December 10, 2014 10:12:32 AM UTC-8, Les Barstow wrote:
There is in fact a performance difference between has_parent and other filters, as well as a difference in memory/cache use - especially in earlier versions of ES. This is due to the way in which ES has to query the parent/child relationship.

I do believe that there are some significant performance improvements to parent/child documents in 1.3.0+ - check the release notes. Also, I believe there might have been some tuning and monitoring additions in the newer versions that might help you. (I'm a user of our cluster, not so much an administrator, so I'm not so sure on the latter...)

--
Les Barstow, Senior Software Engineer
Return Path, Inc.

On Tue, Dec 9, 2014 at 7:53 PM, Xiaolin Xie <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="LSnBoqX6XZwJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">alph...@...> wrote:

Hi Elastic Search developers

I am new to ES. We had some performance issues with our Elastic Search system, and we would like to get some ideas/thoughts about this issue from your guys.

Here is our use case: we have three types of documents in one index: “campaign_group”, “campaign”, and “ad”. “campaign_group” is the parent of “campaign”, and “campaign” is the parent of “ad”.  Each document type has about 10 simple properties, such as string, long, short. The three kinds of documents all have a property “user”(long) and a property “run_status”(short). Documents are hashed by “user”, documents with the same “user” are mapped into the same shard.

We have about 1.4 billion documents in total. We have 200 shards, 3 master node, and 21 data nodes, and each shard has too replica.  The total data size is 1.5TB. We are running elasticsearch 1.21.

Queries are made against specific shard by routing. The flowing query(1) checks the run_status of “ads”(run_status is a short type), and it takes about 100 milliseconds. The query(2) checks both the run_status of “ad”, and the run_status of its parent, and it takes about 2000 milliseconds.  It looks like there are some performance issues with the has_parent filter.

Do your guys have any thoughts about this problem? Is it expected(because ES cannot support has_parent well)? Or something else cloud result this problem? Or we should upgrade our Elastic Search version?

Please let me know if you need any other information about our uses cases. 

Any thoughts/ideas will be highly appreciated.

========================Query(1) ========================

{

  "filter":{

    "and":[

      {

        "term":{

          "user":1436594776581528

        }

      },

      {

        "terms":{

          "run_status":[

            1

          ]

        }

      }

    ]

  },

  "sort":{

    "_uid":"desc"

  },

  "size":1000000,

  "from":0

}

 

===========================Query(2)====================

{

  "filter":{

    "and":[

      {

        "term":{

          "user":1436594776581528

        }

      },

      {

        "terms":{

          "run_status":[

            1

          ]

        }

      },

      {

          "has_parent" : {

              "parent_type": "campaign",

              "filter" : {

                  "terms" : {

                      "run_status" : [1]

                  }

              }

          }

      }

    ]

  },

  "sort":{

    "_uid":"desc"

  },

  "size":1000000,

  "from":0

}

 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="LSnBoqX6XZwJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/220b1d9a-da80-416c-8b8d-d7cc3efc8b5a%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/220b1d9a-da80-416c-8b8d-d7cc3efc8b5a%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/220b1d9a-da80-416c-8b8d-d7cc3efc8b5a%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/220b1d9a-da80-416c-8b8d-d7cc3efc8b5a%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/85c1c4aa-e43e-47e2-ac62-87495f385245%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.