Searching complex parent and child docs in the same query.

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Searching complex parent and child docs in the same query.

tbrianjones

I am storing information about companies in an ES index.  Right now I have a `company` type and a `file` type.  Each file doc has a parent document that is a `company` type.  I would like to be able to search companies by both the data within the `company` docs and within the child `file` docs.  I have been unable to figure out how to do this and am considering storing the `files` as nested objects inside the `company` docs.  My concern is that this will create massive `company` docs that will cause some unforeseen problems, as a company can have thousands of `files` associated with it.

Company docs have a lot of information that I need to search by: geolocation data, certifications, descriptions, titles, etc.

File docs also have a lot of information like title, description, keywords, etc... as well as the content of the actual file ( pdf, word, ppt ).

Searching for top children won't take company doc data into account.

Filtering by a query on the children won't allow me to add a weighting based on file results.

What should I do?  Will including thousands of `file` docs ( with file attachments ) as nested objects inside the parent doc cause problems?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Searching complex parent and child docs in the same query.

ppearcy
Would it be possible to push the company info into each file doc? This would be painful, though, if the company data changes. 

I think your only other option is nested documents. This means that when the file or company data changes that company/files doc will get entirely re-indexed. Will this work at your scale? I have no clue :-)

So, you have to denormalize one way or another. Pushing the company data into each file seems to be the cleaner approach.

Best Regards,
Paul

On Thursday, March 28, 2013 9:31:14 PM UTC-6, Brian Jones wrote:

I am storing information about companies in an ES index.  Right now I have a `company` type and a `file` type.  Each file doc has a parent document that is a `company` type.  I would like to be able to search companies by both the data within the `company` docs and within the child `file` docs.  I have been unable to figure out how to do this and am considering storing the `files` as nested objects inside the `company` docs.  My concern is that this will create massive `company` docs that will cause some unforeseen problems, as a company can have thousands of `files` associated with it.

Company docs have a lot of information that I need to search by: geolocation data, certifications, descriptions, titles, etc.

File docs also have a lot of information like title, description, keywords, etc... as well as the content of the actual file ( pdf, word, ppt ).

Searching for top children won't take company doc data into account.

Filtering by a query on the children won't allow me to add a weighting based on file results.

What should I do?  Will including thousands of `file` docs ( with file attachments ) as nested objects inside the parent doc cause problems?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Searching complex parent and child docs in the same query.

q42jaap
You can issue the query on the file type and use the has_parent query or filter to query on properties of the company.

Elasticsearch will hold the list of _id's of these parents in memory, it will use this as a filter for the file docs.


Jaap Taal
 
[ Q42 BV | tel 070 44523 42 | direct 070 44523 65 | http://q42.nl | Waldorpstraat 17F, Den Haag | Vijzelstraat 72 unit 4.23, Amsterdam | KvK 30164662 ]


On Fri, Mar 29, 2013 at 7:31 AM, ppearcy <[hidden email]> wrote:
Would it be possible to push the company info into each file doc? This would be painful, though, if the company data changes. 

I think your only other option is nested documents. This means that when the file or company data changes that company/files doc will get entirely re-indexed. Will this work at your scale? I have no clue :-)

So, you have to denormalize one way or another. Pushing the company data into each file seems to be the cleaner approach.

Best Regards,
Paul


On Thursday, March 28, 2013 9:31:14 PM UTC-6, Brian Jones wrote:

I am storing information about companies in an ES index.  Right now I have a `company` type and a `file` type.  Each file doc has a parent document that is a `company` type.  I would like to be able to search companies by both the data within the `company` docs and within the child `file` docs.  I have been unable to figure out how to do this and am considering storing the `files` as nested objects inside the `company` docs.  My concern is that this will create massive `company` docs that will cause some unforeseen problems, as a company can have thousands of `files` associated with it.

Company docs have a lot of information that I need to search by: geolocation data, certifications, descriptions, titles, etc.

File docs also have a lot of information like title, description, keywords, etc... as well as the content of the actual file ( pdf, word, ppt ).

Searching for top children won't take company doc data into account.

Filtering by a query on the children won't allow me to add a weighting based on file results.

What should I do?  Will including thousands of `file` docs ( with file attachments ) as nested objects inside the parent doc cause problems?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.