Sorting by sum of matching children

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Sorting by sum of matching children

Ólafur Gauti Guðmundsson
Hi all,
I have an index with a parent type (let's call this "Equipment"), and a child type (let's call this "Availability").
An example Equipment document:

{
    _id: 1,
    title: "Bike",
    ...
}

An example Availability document:

{
    _parent: 1,
    day: "2012-04-04",
        available: 2, 
    price_a: 100,
    price_b: 90
}

The prices for the same equipment can be different each day.

The query I want to perform is basically:

Find all equipment that is available on day X and day Y, and sort by price.

The "finding" part is not hard, I use a set of "has_child" queries to find equipment documents where availability is larger than zero during the days in question.
My problem is: how do I sort by the price?

The price for the equipment is the sum of the prices for days X and Y, and the price for the day is the sum of "price_a" and "price_b" for that day.
Therefore, I need to sort by the sum of a field in selected child documents (the ones matching the query).

Is there any way of doing this?
Is there an easier way if the availability documents were nested inside the equipment document?

Thanks a lot!
-OGG
Reply | Threaded
Open this post in threaded view
|

Re: Sorting by sum of matching children

Frederick Cheung
On Apr 2, 2:46 pm, Ólafur Gauti Guðmundsson <[hidden email]>
wrote:

> My problem is: how do I sort by the price?
>
> The price for the equipment is the sum of the prices for days X and Y, and
> the price for the day is the sum of "price_a" and "price_b" for that day.
> Therefore, I need to sort by the sum of a field in selected child documents
> (the ones matching the query).
>
> Is there any way of doing this?
> Is there an easier way if the availability documents were nested inside the
> equipment document?

I don't think that nested documents help in terms of your ordering
problem.

You can kind of do this with top_children:

if you have a query that looks something like

top_children: {
  type: "availability"
  query: {
    custom_score: {
      query: { query on availability here }
      script: "doc['price_a'] + doc['price_b']
    }
  }
}

Then ES runs the query against availability and returns the parent
documents for the matched children, sorted by the score from the query
agains availability. If an equipment can have multiple matching
availabilities then you can tell ES whether the score should be the
max/average/sum of the children scores.

The fiddly bit is that the query against availability is run with an
expected hit count (you can control this), documents that fall outside
of this hit count won't be used. The returned hit count is also not
always trustworthy. Someone is also working on a has_parent filter
that would probably be useful for this kind of stuff

Fred

Reply | Threaded
Open this post in threaded view
|

Re: Sorting by sum of matching children

Ólafur Gauti Guðmundsson
Hi Fred,
Thanks a lot for your help, I really appreciate it.
The "top_children" feature is very interesting. However, in the query part, I must be able to specify that an availability exists for each day in an interval.

For example, I may be searching for for equipment that is available on 2012-04-10 AND 2012-04-11.

In this case: can I somehow specify in the custom_score query insider top_children, that two distinct children MUST exists, one matching each day?
That is, I am only interested in equipment that has BOTH of these availability children:

child 1: (day = 2012-04-10 AND availability > 0)
child 2: (day = 2012-04-11 AND availability > 0)

These children must both exist, and I want the sum of the prices for both these children to score the parent.
Not sure if I can do that?

Thanks!
-OGG

On Monday, April 2, 2012 3:51:17 PM UTC, Frederick Cheung wrote:
On Apr 2, 2:46 pm, Ólafur Gauti Guðmundsson <[hidden email]>
wrote:

> My problem is: how do I sort by the price?
>
> The price for the equipment is the sum of the prices for days X and Y, and
> the price for the day is the sum of "price_a" and "price_b" for that day.
> Therefore, I need to sort by the sum of a field in selected child documents
> (the ones matching the query).
>
> Is there any way of doing this?
> Is there an easier way if the availability documents were nested inside the
> equipment document?

I don't think that nested documents help in terms of your ordering
problem.

You can kind of do this with top_children:

if you have a query that looks something like

top_children: {
  type: "availability"
  query: {
    custom_score: {
      query: { query on availability here }
      script: "doc['price_a'] + doc['price_b']
    }
  }
}

Then ES runs the query against availability and returns the parent
documents for the matched children, sorted by the score from the query
agains availability. If an equipment can have multiple matching
availabilities then you can tell ES whether the score should be the
max/average/sum of the children scores.

The fiddly bit is that the query against availability is run with an
expected hit count (you can control this), documents that fall outside
of this hit count won't be used. The returned hit count is also not
always trustworthy. Someone is also working on a has_parent filter
that would probably be useful for this kind of stuff

Fred

Reply | Threaded
Open this post in threaded view
|

Re: Sorting by sum of matching children

Frederick Cheung


On Apr 2, 7:13 pm, Ólafur Gauti Guðmundsson <[hidden email]>
wrote:

> For example, I may be searching for for equipment that is available on
> 2012-04-10 AND 2012-04-11.
>
> In this case: can I somehow specify in the custom_score query insider
> top_children, that two distinct children MUST exists, one matching each day?
> That is, I am only interested in equipment that has BOTH of these
> availability children:
>
> child 1: (day = 2012-04-10 AND availability > 0)
> child 2: (day = 2012-04-11 AND availability > 0)
>
> These children must both exist, and I want the sum of the prices for both
> these children to score the parent.
> Not sure if I can do that?
>
Hmm, not sure about that. Perhaps using the query filter to filter by
each day tht you need?

Or set the score to be 1000000 + price_a + price_b
and set top_children to use the sum of the children scores.
Assuming the sum of all the price_a and price_b the selecting results
with score > 3000000 (use the min_score option) would be the rows with
3 children, and sorting by score still gives you results sorted by
total price.

Fred

>
>
>
>
>
>
> On Monday, April 2, 2012 3:51:17 PM UTC, Frederick Cheung wrote:
>
> > On Apr 2, 2:46 pm, Ólafur Gauti Guðmundsson <[hidden email]>
> > wrote:
> > > My problem is: how do I sort by the price?
>
> > > The price for the equipment is the sum of the prices for days X and Y,
> > and
> > > the price for the day is the sum of "price_a" and "price_b" for that
> > day.
> > > Therefore, I need to sort by the sum of a field in selected child
> > documents
> > > (the ones matching the query).
>
> > > Is there any way of doing this?
> > > Is there an easier way if the availability documents were nested inside
> > the
> > > equipment document?
>
> > I don't think that nested documents help in terms of your ordering
> > problem.
>
> > You can kind of do this with top_children:
>
> > if you have a query that looks something like
>
> > top_children: {
> >   type: "availability"
> >   query: {
> >     custom_score: {
> >       query: { query on availability here }
> >       script: "doc['price_a'] + doc['price_b']
> >     }
> >   }
> > }
>
> > Then ES runs the query against availability and returns the parent
> > documents for the matched children, sorted by the score from the query
> > agains availability. If an equipment can have multiple matching
> > availabilities then you can tell ES whether the score should be the
> > max/average/sum of the children scores.
>
> > The fiddly bit is that the query against availability is run with an
> > expected hit count (you can control this), documents that fall outside
> > of this hit count won't be used. The returned hit count is also not
> > always trustworthy. Someone is also working on a has_parent filter
> > that would probably be useful for this kind of stuff
>
> > Fred
Reply | Threaded
Open this post in threaded view
|

Re: Sorting by sum of matching children

Ævar Arnfjörð Bjarmason
In reply to this post by Ólafur Gauti Guðmundsson
2012/4/2 Ólafur Gauti Guðmundsson <[hidden email]>:
> Is there any way of doing this?
> Is there an easier way if the availability documents were nested inside the
> equipment document?

Have you considered just using ElasticSearch for textual searching and
basic filtering, and then doing the logic you want to do in e.g. a
relational database?

E.g. you could get a list of ID's for candidate equipment from ES and
then filter it down somewhere else.

You might be able to get this to work, but you're essentially trying
to do something that's probably better suited for a relational
database in ES.
Reply | Threaded
Open this post in threaded view
|

Re: Sorting by sum of matching children

Ólafur Gauti Guðmundsson
Hi all,
Thanks for your replies.

@Ævar: Yep, I've considered that. I am exploring a few different options to solve the problem, after which I'll weigh the pros and cons of each one. I am indeed using a relational database, but keeping "processed" views of the data (accumulated availability/capacity pairs, etc.) in ElasticSearch for speedy searches, and making heavy use of facets. 

For now, I'll make do with aggregated prices on the parent document, which will enable me to do the type of sorting I'm after.

Thanks for your suggestions!

Best regards,
OGG

On Monday, April 2, 2012 9:53:48 PM UTC, Ævar Arnfjörð Bjarmason wrote:
2012/4/2 Ólafur Gauti Guðmundsson <[hidden email]>:
> Is there any way of doing this?
> Is there an easier way if the availability documents were nested inside the
> equipment document?

Have you considered just using ElasticSearch for textual searching and
basic filtering, and then doing the logic you want to do in e.g. a
relational database?

E.g. you could get a list of ID's for candidate equipment from ES and
then filter it down somewhere else.

You might be able to get this to work, but you're essentially trying
to do something that's probably better suited for a relational
database in ES.


On Monday, April 2, 2012 9:53:48 PM UTC, Ævar Arnfjörð Bjarmason wrote:
2012/4/2 Ólafur Gauti Guðmundsson <[hidden email]>:
> Is there any way of doing this?
> Is there an easier way if the availability documents were nested inside the
> equipment document?

Have you considered just using ElasticSearch for textual searching and
basic filtering, and then doing the logic you want to do in e.g. a
relational database?

E.g. you could get a list of ID's for candidate equipment from ES and
then filter it down somewhere else.

You might be able to get this to work, but you're essentially trying
to do something that's probably better suited for a relational
database in ES.

Reply | Threaded
Open this post in threaded view
|

Re: Sorting by sum of matching children

cole
Hi Ólafur,

It sounds like you've already found a solution, but you should be able
to get at what you're after using nested documents as well.  I'm not
100% sure I have your use case correct, but I've tried to translate it
into a nested document scheme.  The pseudocode below contains a
filtered query with a 'filter' component that filters the result set
down to only those documents with a nested 'availability' document
containing a day you care about and an 'availability' field that is
greater than 0.  The 'query' component applies a custom score that is
the sum of the sum of 'price_a' and 'price_b' for each matching day of
availability.

This pseudocode might not be entirely correct (or optimal).  If you
have any trouble translating it into real code, let me know and I'll
gist a working example.

query
  filtered
    query
      nested
        path: availability
        score_mode: total
        query
          custom_score
            script: "doc[\"price_a\"] + doc[\"price_b\"]"
            query
              terms
                day: ["2012-04-10", "2012-04-11"]
    filter
      and
        nested
          path: availability
          filter
            and
              term
                day: "2012-04-10"
              numeric_range
                availability
                  gt: 0
        nested
          path: availability
          filter
            and
              term
                day: "2012-04-11"
              numeric_range
                availability
                  gt: 0

On Apr 2, 3:07 pm, Ólafur Gauti Guðmundsson <[hidden email]>
wrote:

> Hi all,
> Thanks for your replies.
>
> @Ævar: Yep, I've considered that. I am exploring a few different options to
> solve the problem, after which I'll weigh the pros and cons of each one. I
> am indeed using a relational database, but keeping "processed" views of the
> data (accumulated availability/capacity pairs, etc.) in ElasticSearch for
> speedy searches, and making heavy use of facets.
>
> For now, I'll make do with aggregated prices on the parent document, which
> will enable me to do the type of sorting I'm after.
>
> Thanks for your suggestions!
>
> Best regards,
> OGG
>
>
>
>
>
>
>
> On Monday, April 2, 2012 9:53:48 PM UTC, Ævar Arnfjörð Bjarmason wrote:
>
> > 2012/4/2 Ólafur Gauti Guðmundsson <[hidden email]>:
> > > Is there any way of doing this?
> > > Is there an easier way if the availability documents were nested inside
> > the
> > > equipment document?
>
> > Have you considered just using ElasticSearch for textual searching and
> > basic filtering, and then doing the logic you want to do in e.g. a
> > relational database?
>
> > E.g. you could get a list of ID's for candidate equipment from ES and
> > then filter it down somewhere else.
>
> > You might be able to get this to work, but you're essentially trying
> > to do something that's probably better suited for a relational
> > database in ES.
>
> On Monday, April 2, 2012 9:53:48 PM UTC, Ævar Arnfjörð Bjarmason wrote:
>
> > 2012/4/2 Ólafur Gauti Guðmundsson <[hidden email]>:
> > > Is there any way of doing this?
> > > Is there an easier way if the availability documents were nested inside
> > the
> > > equipment document?
>
> > Have you considered just using ElasticSearch for textual searching and
> > basic filtering, and then doing the logic you want to do in e.g. a
> > relational database?
>
> > E.g. you could get a list of ID's for candidate equipment from ES and
> > then filter it down somewhere else.
>
> > You might be able to get this to work, but you're essentially trying
> > to do something that's probably better suited for a relational
> > database in ES.
Reply | Threaded
Open this post in threaded view
|

Re: Sorting by sum of matching children

Ólafur Gauti Guðmundsson
Hi Cole,
This is super cool, thanks a lot! I can confirm that this nested method works.

Thanks very much for your help, really appreciate it :)

Best regards,
OGG

On Wednesday, April 4, 2012 12:06:39 AM UTC, cole wrote:
Hi Ólafur,

It sounds like you've already found a solution, but you should be able
to get at what you're after using nested documents as well.  I'm not
100% sure I have your use case correct, but I've tried to translate it
into a nested document scheme.  The pseudocode below contains a
filtered query with a 'filter' component that filters the result set
down to only those documents with a nested 'availability' document
containing a day you care about and an 'availability' field that is
greater than 0.  The 'query' component applies a custom score that is
the sum of the sum of 'price_a' and 'price_b' for each matching day of
availability.

This pseudocode might not be entirely correct (or optimal).  If you
have any trouble translating it into real code, let me know and I'll
gist a working example.

query
  filtered
    query
      nested
        path: availability
        score_mode: total
        query
          custom_score
            script: "doc[\"price_a\"] + doc[\"price_b\"]"
            query
              terms
                day: ["2012-04-10", "2012-04-11"]
    filter
      and
        nested
          path: availability
          filter
            and
              term
                day: "2012-04-10"
              numeric_range
                availability
                  gt: 0
        nested
          path: availability
          filter
            and
              term
                day: "2012-04-11"
              numeric_range
                availability
                  gt: 0

On Apr 2, 3:07 pm, Ólafur Gauti Guðmundsson <[hidden email]>
wrote:

> Hi all,
> Thanks for your replies.
>
> @Ævar: Yep, I've considered that. I am exploring a few different options to
> solve the problem, after which I'll weigh the pros and cons of each one. I
> am indeed using a relational database, but keeping "processed" views of the
> data (accumulated availability/capacity pairs, etc.) in ElasticSearch for
> speedy searches, and making heavy use of facets.
>
> For now, I'll make do with aggregated prices on the parent document, which
> will enable me to do the type of sorting I'm after.
>
> Thanks for your suggestions!
>
> Best regards,
> OGG
>
>
>
>
>
>
>
> On Monday, April 2, 2012 9:53:48 PM UTC, Ævar Arnfjörð Bjarmason wrote:
>
> > 2012/4/2 Ólafur Gauti Guðmundsson <[hidden email]>:
> > > Is there any way of doing this?
> > > Is there an easier way if the availability documents were nested inside
> > the
> > > equipment document?
>
> > Have you considered just using ElasticSearch for textual searching and
> > basic filtering, and then doing the logic you want to do in e.g. a
> > relational database?
>
> > E.g. you could get a list of ID's for candidate equipment from ES and
> > then filter it down somewhere else.
>
> > You might be able to get this to work, but you're essentially trying
> > to do something that's probably better suited for a relational
> > database in ES.
>
> On Monday, April 2, 2012 9:53:48 PM UTC, Ævar Arnfjörð Bjarmason wrote:
>
> > 2012/4/2 Ólafur Gauti Guðmundsson <[hidden email]>:
> > > Is there any way of doing this?
> > > Is there an easier way if the availability documents were nested inside
> > the
> > > equipment document?
>
> > Have you considered just using ElasticSearch for textual searching and
> > basic filtering, and then doing the logic you want to do in e.g. a
> > relational database?
>
> > E.g. you could get a list of ID's for candidate equipment from ES and
> > then filter it down somewhere else.
>
> > You might be able to get this to work, but you're essentially trying
> > to do something that's probably better suited for a relational
> > database in ES.