Re: policies with security definer option for allowing inline optimization

From: Dan Lynch <pyramation(at)gmail(dot)com>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Isaac Morland <isaac(dot)morland(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: policies with security definer option for allowing inline optimization
Date: 2021-04-02 21:24:59
Message-ID: CA+_muLEz6dBjn_Y9FHkovJXHYL8-VZkaGGg4eGqDG4aNu9aziQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

My goal is to use RLS for everything, including SELECTs, so it's super
important to consider every performance tweak possible. Appreciate any
insights or comments. I'm also hoping to document this better for
application developers who want to use postgres and RLS.

Does anyone know details of, or where to find more information about the
implications of the optimizer on the quals/checks for the policies being
functions vs inline?

It seems that the solution today may be that we have to write functions
with security definer. I also saw Joe's linked in share regarding an
article using inline functions
<https://blog.crunchydata.com/blog/a-postgresql-row-level-security-primer-creating-large-policies>
in the qual/checks to solve a policy size issue, but also wondering the
performance implications of inline vs functions:

Imagine you need to do a JOIN to check an owned table against an acl table

(group_id = ANY (ARRAY (
SELECT
acl.entity_id
FROM
org_memberships_acl acl
JOIN app_groups obj ON acl.entity_id = obj.owner_id
WHERE
acl.actor_id = current_user_id())))

you could wrap that query into a function (so we can apply SECURITY DEFINER
to the tables involved to avoid nested RLS lookups)

(group_id = ANY (ARRAY (
get_group_ids_of_current_user()
)))

Does anyone here know how the optimizer would handle this? I suppose if the
get_group_ids_of_current_user() function is marked as STABLE, would the
optimizer cache this value for every row in a SELECT that returned
multiple rows? Is it possible that if the function is sql vs plpgsql it
makes a difference?

Am I splitting hairs here, and maybe this is a trivial nuance that
shouldn't really matter for performance? If it's true that inline functions
would perform bretter, then definitely this thread and potentially feature
request seems pretty important.

*Other important RLS Performance Optimizations*

I also want to share my research from online so it's documented somewhere.
I would love to get more information to formally document these
optimizations. Here are the two articles I've found to be useful for how to
structure RLS policies performantly:

https://cazzer.medium.com/designing-the-most-performant-row-level-security-strategy-in-postgres-a06084f31945

https://medium.com/@ethanresnick/there-are-a-few-faster-ways-that-i-know-of-to-handle-the-third-case-with-rls-9d22eaa890e5

The 2nd article was particularly useful (which was written in response to this
article
<https://medium.com/@bartels/using-postgresql-row-level-security-rls-to-authorize-read-queries-for-your-applications-users-a2838d2afb92>),
highlighting an important detail that should probably be more explicit for
folks writing policies, especially for SELECT policies. Essentially it
boils down to not passing properties from the rows into the functions used
to check security, but instead inverting the logic and instead returning
the identifiers as an array and checking if the row's owned key matches one
of those identifiers.

For example,

a GOOD qual/check expr

owner_id = ANY ( function_that_gets_current_users_organization_ids() )

a BAD qual/check expr

can_user_access_object(owner_id)

The main benefit of the first expr is that if
function_that_gets_current_users_organization_ids is STABLE, the optimizer
can run this once for all rows, and thus for SELECTs should actually run
fast. The 2nd expr takes as an argument the column, which would have to run
for every single row making SELECTs run very slow depending on the function.

This actually is pretty intuitive once you look at it. Reversing the logic
and returning IDs makes sense when you imagine what PG has to do in order
to check rows, I suppose there are limitations depending on the cardinality
of the IDs returned and postgres's ability to check some_id = ANY (array)
for large arrays.

Dan Lynch
(734) 657-4483

On Fri, Apr 2, 2021 at 7:47 AM Joe Conway <mail(at)joeconway(dot)com> wrote:

> On 4/2/21 10:23 AM, Stephen Frost wrote:
> > Greetings,
> >
> > * Joe Conway (mail(at)joeconway(dot)com) wrote:
> >> On 4/2/21 9:57 AM, Isaac Morland wrote:
> >> >Views already run security definer, allowing them to be used for some
> of
> >> >the same information-hiding purposes as RLS. But I just found something
> >> >strange: current_user/_role returns the user's role, not the view
> owner's
> >> >role:
> >>
> >> >postgres=# set role to t1;
> >> >SET
> >> >postgres=> table tt;
> >> >ERROR: permission denied for table tt
> >> >postgres=> table tv;
> >> > ?column? | current_user
> >> >----------+--------------
> >> > 5 | t1
> >> >(1 row)
> >> >
> >> >postgres=>
> >> >
> >> >Note that even though current_user is t1 "inside" the view, it is still
> >> >able to see the contents of table tt. Shouldn't current_user/_role
> return
> >> >the view owner in this situation? By contrast security definer
> functions
> >> >work properly:
> >>
> >> That is because while VIEWs are effectively SECURITY DEFINER for table
> >> access, functions running as part of the view are still SECURITY
> INVOKER if
> >> they were defined that way. And "current_user" is essentially just a
> special
> >> grammatical interface to a SECURITY INVOKER function:
> >
> > Right- and what I was really getting at is that it'd sometimes be nice
> > to have the view run as 'security invoker' for table access. In
> > general, it seems like it'd be useful to be able to control each piece
> > and define if it's to be security invoker or security definer. We're
> > able to do that for functions, but not other parts of the system.
>
> +1
>
> Agreed -- I have opined similarly in the past
>
> Joe
>
> --
> Crunchy Data - http://crunchydata.com
> PostgreSQL Support for Secure Enterprises
> Consulting, Training, & Open Source Development
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2021-04-02 22:55:54 Re: Using COPY FREEZE in pgbench
Previous Message Marko Tiikkaja 2021-04-02 21:09:09 Re: [PATCH] Implement motd for PostgreSQL