Re: Improving RLS planning

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improving RLS planning
Date: 2016-11-08 14:23:39
Message-ID: 20161108142339.GX13284@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom,

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
> > On 25 October 2016 at 22:58, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> The alternative I'm now thinking about pursuing is to get rid of the
> >> conversion of RLS quals to subqueries. Instead, we can label individual
> >> qual clauses with security precedence markings.
>
> > +1 for this approach. It looks like it could potentially be much
> > simpler. There's some ugly code in the inheritance planner (and
> > probably one or two other places) that it might be possible to chop
> > out, which would probably also speed up planning times.
>
> Here's a draft patch for this. I've only addressed the RLS use-case
> so far, so this doesn't get into managing the order of application of
> join quals, only restriction quals.

Thanks much for working on this.

Just a few relatively quick comments:

> >> 2. In order_qual_clauses, sort first by security_level and second by cost.
>
> > I think that ordering might be sub-optimal if you had a mix of
> > leakproof quals and security quals and the cost of some security quals
> > were significantly higher than the cost of some other quals. Perhaps
> > all leakproof quals should be assigned security_level 0, to allow them
> > to be checked earlier if they have a lower cost (whether or not they
> > are security quals), and only leaky quals would have a security_level
> > greater than zero. Rule 1 would then not need to check whether the
> > qual was leakproof, and you probably wouldn't need the separate
> > "leakproof" bool field on RestrictInfo.
>
> Hm, but it would also force leakproof quals to be evaluated in front
> of potentially-cheaper leaky quals, whether or not that's semantically
> necessary.

I agree that this is a concern.

> I experimented with ignoring security_level altogether for leakproof
> quals, but I couldn't make it work properly, because that didn't lead to
> a comparison rule that satisfies transitivity. For instance, consider
> three quals:
> A: cost 1, security_level 1, leaky
> B: cost 2, security_level 1, leakproof
> C: cost 3, security_level 0, leakproof
> A should sort before B, since same security_level and lower cost;
> B should sort before C, since lower cost and leakproof;
> but A must sort after C, since higher security_level and leaky.
>
> So what I ended up doing was using your idea of forcing the security
> level to 0 for leakproof quals, but only if they have cost below a
> threshold (which I set at 10X cpu_operator_cost, which should take in
> most built-in functions). That at least limits the possible damage
> from forcing early evaluation of a leakproof qual. There may be
> some better way to do it, though, so I didn't go so far as to remove
> the separate leakproof flag.

I'm not a huge fan of these kinds of magic values, particularly when
they can't be independently managed, but, from a practical perspective,
I have a hard time seeing a real problem with this approach.

> Some other notes:
>
> * This creates a requirement on scan-planning code (and someday on
> join-planning code) to be sure it doesn't create violations of the qual
> ordering rule. Currently only indxpath.c and tidpath.c have to worry
> about that AFAICS. FDWs would need to worry about it too, except that
> we don't currently allow RLS to be enabled on foreign tables. I'm a
> little concerned about whether FDWs could create security holes by
> not accounting for this, but it's moot for now. Custom scan providers
> will need to pay attention as well.

I'd certainly like to support RLS on FDWs (and views...), but we need to
hash out what the exact semantics of that will be and this looks like
something we should be able to address when we look at adding that
support.

> * prepsecurity.c is now dead code and should be removed, but I did not
> include that in this patch, since it would just bloat the patch.

Sure.

> * Accounting for the removal of prepsecurity.c, this is actually a net
> savings of about 300 lines of code. So I feel pretty good about that.
> It also gets rid of some really messy kluges, particularly the behavior
> of generating new subquery RTEs as late as halfway through
> grouping_planner. I find it astonishing that that worked at all.

I'm a bit surprised to hear that as we've been doing that for quite a
while, though looking back on it, I can see why you bring it up.

> * Since the planner is now depending on Query.hasRowSecurity to be set
> whenever there are any securityQuals, I put in an Assert about that,
> and promptly found three places in prepjointree.c and the rewriter where
> we'd been failing to set it. I have not looked to see if these represent
> live bugs in existing releases, but they might. Or am I misunderstanding
> what the flag is supposed to mean?

They're independent, actually. securityQuals can be set via either
security barrier view or from RLS, while hasRowSecurity is specifically
for the RLS case. The reason for the distinction is that changing your
role isn't going to impact security barrier views at all, while it could
impact what RLS policies are used. See extract_query_dependencies().

> * Aside from plan changes, there's one actual behavioral change in the
> regression test results, but I think it's okay because it's a question
> of whether to put a non-RLS qual before or after a leaky qual. That's
> not something we are promising anything about.

Agreed.

> * There's one test query in updatable_views.sql where the plan collapses
> to a dummy (Result with constant false qual) because the planner is now
> able to see that the qual conditions are mutually contradictory. Maybe
> that query needs adjustment; I'm not sure what it's intending to test
> exactly.

Would have to look at the specific test in question, will try to do so
this week, though I'm also planning to be more involved in the CF this
month and want to make progress there.

Thanks again!

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2016-11-08 14:27:57 Re: C based plugins, clocks, locks, and configuration variables
Previous Message Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?= 2016-11-08 13:53:26 Re: Re: [PATCH] Tab completion for ALTER TYPE … RENAME VALUE …