Arguable RLS security bug, EvalPlanQual() paranoia

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject: Arguable RLS security bug, EvalPlanQual() paranoia
Date: 2015-06-01 07:29:24
Message-ID: CAM3SWZScG+S17vwT+E82o=aNrjqar6=kCoAnGf+vw=n4PaAgCw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Very minor concern about RLS
========================

This existing ExecUpdate() comment seems a little inaccurate:

/*
* Check any RLS UPDATE WITH CHECK policies
*
* If we generate a new candidate tuple after EvalPlanQual testing, we
* must loop back here and recheck any RLS policies and constraints.
* (We don't need to redo triggers, however. If there are any BEFORE
* triggers then trigger.c will have done heap_lock_tuple to lock the
* correct tuple, so there's no need to do them again.)

Because ExecBRUpdateTriggers() does an UPSERT-style heap_lock_tuple()
call, restarting within ExecUpdate() should be impossible once
ExecBRUpdateTriggers() returns and returns a value != NULL (this is
*essential* for UPSERT, but happens to be true here, too).

Aside from this comment understating what can be relied on in a way
that seems a bit inaccurate, I worry about RLS leaks made possible by
EPQ, which is a hazard that other MVCC systems naturally don't have,
because unlike Postgres they have statement-level rollback to deal
with READ COMMITTED mode concurrent UPDATEs. Excuse me if I missed it,
but was this something that was considered?

What I'm imagining is a maliciously constructed UPDATE. It's run in
such a way that it could be allowed to UPDATE a row based on that not
being limited by the security barrier quals added by RLS. However,
having followed the UPDATE chain (i.e. just before "goto lreplace"),
the quals now don't pass and ExecUpdate() returns NULL (because of the
security barrier quals not passing EvalPlanQual() recheck on the new
tuple version -- the only possible reason, we'll say). The UPDATE is
a no-op, in the sense that it generates an identical new tuple
(identical to the old/existing one). By enabling log_lock_waits, the
attacker can observe that the new row version does not pass.

Why does this matter, given what is already obviously possible --
repeatedly selecting (or maybe "no-op updating") from the target table
in a READ COMMITTED xact, and seeing what goes away? Well, for one
thing, log_lock_waits will leak the XID on the concurrent (more
privileged) updater. It might then be trivial to reconstruct the
identity of the privileged updater based on some other row in some
other table, where xmin matches that privileged XID. Maybe that's a
hazard not peculiar to Postgres, though -- one can imagine a system
with statement-level rollback also leaking this information, and
perhaps that's just the way things are for every system with RLS, sort
of like constraint-related leaks. What I'm more concerned about is the
way things may not be fully consistent with the command's MVCC
snapshot, and that really is a thing peculiar to Postgres.

Arguable RLS security bug and EvalPlanQual()
====================================

It is certainly conceivable that an UPDATE's USING security barrier
qual contains a subquery referencing a relation that is not the
UPDATE's target relation -- CREATE POLICY is very flexible in
accepting such USING security barrier quals. Does this not introduce
security hazards around other relations being queried using the
command's MVCC snapshot, while the target has an EPQ-installed (only
dirty snapshot visible) tuple? See attached patch, an example
illustrating this arguable security problem using isolationtester. It
could certainly be argued that EvalPlanQual() allows an unacceptable
information leak, and I'd say that at the very least we need to
document this.

If you're using another well known MVCC database system that has RLS,
I imagine when this happens the attacker similarly waits on the
conflicting (privileged) xact to finish (in my example in the patch,
Bob's xact). However, unlike with the Postgres READ COMMITTED mode,
Mallory would then have her malicious UPDATE statement entirely rolled
back, and her statement would acquire an entirely new MVCC snapshot,
to be used by the USING security barrier qual (and everything else)
from scratch. This other system would then re-run her UPDATE with the
new MVCC snapshot. This would repeat until Mallory's UPDATE statement
completes without encountering any concurrent UPDATEs/DELETEs to her
would-be affected rows.

In general, with this other database system, an UPDATE must run to
completion without violating MVCC, even in READ COMMITTED mode. For
that reason, I think we can take no comfort from the presumption that
this flexibility in USING security barrier quals (allowing subqueries,
etc) works securely in this other system. (I actually didn't check
this out, but I imagine it's true).

I'm sorry if I just missed the discussion around this and this sounds
a bit alarmist. I want to satisfy myself that we have this right.

--
Peter Geoghegan

Attachment Content-Type Size
0001-RLS-isolation-test.patch text/x-patch 7.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-06-01 08:58:09 Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
Previous Message Michael Paquier 2015-06-01 07:24:26 Re: Memory leak with XLogFileCopy since de768844 (WAL file with .partial)