Query-time per-document authorization

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Query-time per-document authorization

Peter Galiovský
Hello,

if one were to integrate elasticsearch with an external access management service that authorized users on a "per view" basis, how should one approach the issue? Let's say that any form of index-side caching of the authorization information is out of question. Every result set needs to be filtered by querying the external access management service. Although surely imparting a hefty performance penalty, in Solr I can imagine solving this by a custom PostFilter. Is there an equivalent functionality in elasticsearch? How could the problem be addressed?

Thanks,
Peter

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Query-time per-document authorization

Michael Sick
Peter,

There are some common use cases where a post filter could have trouble - i.e.show top N documents that match a given query. A filter might (correctly) take all of the results out. You could over fetch but if the %age of docs visible to the user is fairly small you could still miss. You'd also have to decide what to do with aggregates / facets. Can I count docs that I can't see?

For one project we pushed hashes of the groups and users into arrays. It worked but only because the permissions did not change often and user level authorizations were rare. Not sure what we'd have done if they had changed more - we feared heavy reindex costs in that scenario.

Doubt that helped with a solution but maybe it helped with what's not a solution.

--Mike

On Fri, May 24, 2013 at 11:14 AM, Peter Galiovský <[hidden email]> wrote:
Hello,

if one were to integrate elasticsearch with an external access management service that authorized users on a "per view" basis, how should one approach the issue? Let's say that any form of index-side caching of the authorization information is out of question. Every result set needs to be filtered by querying the external access management service. Although surely imparting a hefty performance penalty, in Solr I can imagine solving this by a custom PostFilter. Is there an equivalent functionality in elasticsearch? How could the problem be addressed?

Thanks,
Peter

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Query-time per-document authorization

Karel Minařík-2
In reply to this post by Peter Galiovský
If I understand correctly, you want to restrict people to see only the documents they're allowed to see? First, as Michael writes, filtering the returned results might severely impact the usability/experience for users (no results etc.

I think the solution depeneds on how you embed the information into the documents.

For instance, in the “each user must see only ‘their’ documents”, you would simply add a `user_id` field in the document, and filter on this field, preferably with a `filtered` query. For the “only people in ‘sales’ department can see these documents”, you'd use a similar approach, embedding the department names/codes in the document; when the user performs a search, you probably have information about departments they're part of, and update the query accordingly.

If by “any form of index-side caching of the authorization information is out of question” means that you want to filter the results in 100% realtime, then I'm afraid your only solution is to perform a query, get results, filter them, look if you've got enough or not, if not, repeat the process. I have a bit of a hard time picturing this requirement being accepted as reasonable.

Karel

On Friday, May 24, 2013 5:14:42 PM UTC+2, Peter Galiovský wrote:
Hello,

if one were to integrate elasticsearch with an external access management service that authorized users on a "per view" basis, how should one approach the issue? Let's say that any form of index-side caching of the authorization information is out of question. Every result set needs to be filtered by querying the external access management service. Although surely imparting a hefty performance penalty, in Solr I can imagine solving this by a custom PostFilter. Is there an equivalent functionality in elasticsearch? How could the problem be addressed?

Thanks,
Peter

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Query-time per-document authorization

Peter Galiovský
Michael, Karel, thank you both for your ideas! I had similar thoughts on this issue. If at all possible, I'll store the security information in the index. I just want to be prepared for the occasion that this won't be possible. In that case, most likely a list of "authorized" roles would be stored with each document in the index. At query time, for each possible search result, I would have to ask the external module: "Does this user have any of the 'authorized' roles on _this_ document?"

As Karel mentions, doing this "post-search" brings a lot of usability issues. That's why Solr's PostFilter looks appealing. The name is actually slightly misleading, it's not a "post-search" filter of the kind mentioned. Instead, it is described as "a mechanism to further filter documents after they have already gone through the main query and other filters. This is appropriate for filters with a very high cost." (http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/search/PostFilter.html) As it's still done "in the search engine", facets, pagination/limit/offset etc. should work as usual.

Perhaps my question then really is: What's the proper way of implementing a custom non-caching filter for elasticsearch? And how to use it in a query such that it is evaluated last?

Peter

Dňa sobota, 25. mája 2013 9:11:37 UTC+2 Karel Minařík napísal(-a):
If I understand correctly, you want to restrict people to see only the documents they're allowed to see? First, as Michael writes, filtering the returned results might severely impact the usability/experience for users (no results etc.

I think the solution depeneds on how you embed the information into the documents.

For instance, in the “each user must see only ‘their’ documents”, you would simply add a `user_id` field in the document, and filter on this field, preferably with a `filtered` query. For the “only people in ‘sales’ department can see these documents”, you'd use a similar approach, embedding the department names/codes in the document; when the user performs a search, you probably have information about departments they're part of, and update the query accordingly.

If by “any form of index-side caching of the authorization information is out of question” means that you want to filter the results in 100% realtime, then I'm afraid your only solution is to perform a query, get results, filter them, look if you've got enough or not, if not, repeat the process. I have a bit of a hard time picturing this requirement being accepted as reasonable.

Karel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Query-time per-document authorization

Michael Sick
Hi Peter,

Sorry that I didn't read up on the post filter before responding. I am confused how that works with your #1 constraint of no security tokens on the server side. If it's being executed on the server it must have some security information for comparison. Am I missing something? 

--Mike

On Mon, May 27, 2013 at 12:01 PM, Peter Galiovský <[hidden email]> wrote:
Michael, Karel, thank you both for your ideas! I had similar thoughts on this issue. If at all possible, I'll store the security information in the index. I just want to be prepared for the occasion that this won't be possible. In that case, most likely a list of "authorized" roles would be stored with each document in the index. At query time, for each possible search result, I would have to ask the external module: "Does this user have any of the 'authorized' roles on _this_ document?"

As Karel mentions, doing this "post-search" brings a lot of usability issues. That's why Solr's PostFilter looks appealing. The name is actually slightly misleading, it's not a "post-search" filter of the kind mentioned. Instead, it is described as "a mechanism to further filter documents after they have already gone through the main query and other filters. This is appropriate for filters with a very high cost." (http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/search/PostFilter.html) As it's still done "in the search engine", facets, pagination/limit/offset etc. should work as usual.

Perhaps my question then really is: What's the proper way of implementing a custom non-caching filter for elasticsearch? And how to use it in a query such that it is evaluated last?

Peter

Dňa sobota, 25. mája 2013 9:11:37 UTC+2 Karel Minařík napísal(-a):
If I understand correctly, you want to restrict people to see only the documents they're allowed to see? First, as Michael writes, filtering the returned results might severely impact the usability/experience for users (no results etc.

I think the solution depeneds on how you embed the information into the documents.

For instance, in the “each user must see only ‘their’ documents”, you would simply add a `user_id` field in the document, and filter on this field, preferably with a `filtered` query. For the “only people in ‘sales’ department can see these documents”, you'd use a similar approach, embedding the department names/codes in the document; when the user performs a search, you probably have information about departments they're part of, and update the query accordingly.

If by “any form of index-side caching of the authorization information is out of question” means that you want to filter the results in 100% realtime, then I'm afraid your only solution is to perform a query, get results, filter them, look if you've got enough or not, if not, repeat the process. I have a bit of a hard time picturing this requirement being accepted as reasonable.

Karel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Query-time per-document authorization

Peter Galiovský
Hi Mike,

my apologies for not being clear enough about what I want to achieve. Perhaps I'm a bit naive, but I was thinking about making a remote call (let's say using some low overhead web service) to the external security module from the custom filter class. I know this sounds horribly scary from the performance perspective. I just need a backup plan if storing all the necessary authorization info in the index won't be possible.

Peter

Dňa pondelok, 27. mája 2013 19:07:01 UTC+2 Michael Sick napísal(-a):
Hi Peter,

Sorry that I didn't read up on the post filter before responding. I am confused how that works with your #1 constraint of no security tokens on the server side. If it's being executed on the server it must have some security information for comparison. Am I missing something? 

--Mike

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Query-time per-document authorization

Lukáš Vlček
Hi Peter,

may be you should look at ManifolCF http://manifoldcf.apache.org/en_US/index.html
Af far as I know they are implementing security model for search engines (including Solr and Elasticsearch). Though I haven't been using it myself.

Regards,
Lukas

On Mon, May 27, 2013 at 7:39 PM, Peter Galiovský <[hidden email]> wrote:
Hi Mike,

my apologies for not being clear enough about what I want to achieve. Perhaps I'm a bit naive, but I was thinking about making a remote call (let's say using some low overhead web service) to the external security module from the custom filter class. I know this sounds horribly scary from the performance perspective. I just need a backup plan if storing all the necessary authorization info in the index won't be possible.

Peter

Dňa pondelok, 27. mája 2013 19:07:01 UTC+2 Michael Sick napísal(-a):
Hi Peter,

Sorry that I didn't read up on the post filter before responding. I am confused how that works with your #1 constraint of no security tokens on the server side. If it's being executed on the server it must have some security information for comparison. Am I missing something? 

--Mike

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Query-time per-document authorization

Hendrik
Maybe this is interesting https://groups.google.com/forum/?fromgroups#!topic/elasticsearch/tavroa3Nw5g

Am Dienstag, 28. Mai 2013 09:28:45 UTC+2 schrieb Lukáš Vlček:
Hi Peter,

may be you should look at ManifolCF http://manifoldcf.apache.org/en_US/index.html
Af far as I know they are implementing security model for search engines (including Solr and Elasticsearch). Though I haven't been using it myself.

Regards,
Lukas

On Mon, May 27, 2013 at 7:39 PM, Peter Galiovský <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="Y0FcWqbJpiMJ">galiov...@...> wrote:
Hi Mike,

my apologies for not being clear enough about what I want to achieve. Perhaps I'm a bit naive, but I was thinking about making a remote call (let's say using some low overhead web service) to the external security module from the custom filter class. I know this sounds horribly scary from the performance perspective. I just need a backup plan if storing all the necessary authorization info in the index won't be possible.

Peter

Dňa pondelok, 27. mája 2013 19:07:01 UTC+2 Michael Sick napísal(-a):
Hi Peter,

Sorry that I didn't read up on the post filter before responding. I am confused how that works with your #1 constraint of no security tokens on the server side. If it's being executed on the server it must have some security information for comparison. Am I missing something? 

--Mike

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="Y0FcWqbJpiMJ">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.