Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0
Date: 2015-04-17 08:17:22
Message-ID: CAEZATCVDdYRFbF4NMXTF-NKYibbR2VSfNXRWPyyByaCpV1jwEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 9 April 2015 at 22:18, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> The big idea (the fine details of which Stephen appeared to be in
> tentative agreement with [1]) is that an UPSERT is a hybrid between an
> INSERT and an UPDATE, and not simply an INSERT and separate UPDATE
> tied together. So at the very least the exact behavior of such a
> hybrid cannot be assumed to be any particular way just from
> generalizing from known behaviors for INSERT and UPDATE (which is a
> usability issue, since the fine details of RLS are already complex
> enough without UPSERT).
>

I think that you're over-complicating this. From a usability point of
view, I think that the best approach is to keep this as simple as
possible and make the behaviour consistent with an INSERT and an
UPDATE tied together, as is suggested by the new syntax. The key point
is that, if you are using the RLS INSERT and UPDATE policies for this
new command, the rule should be that the user has permission to
insert/update a new/existing row if and only if they would have had
permission to do so using a stand-alone INSERT/UPDATE command.

On the other hand, if you believe that the behaviour should be
something other than that, then I think that it would need a new
dedicated kind of RLS policy for this command because, as I will
attempt to explain below, merging together the quals from INSERT and
UPDATE policies leads to logical problems.

> The INSERT policies are only enforced when a tuple is inserted
> because, when the alternative path isn't taken then it's really just
> an INSERT.
>

Agreed.

> For the UPDATE path, where the stickiness/hybridness begins...
> <snip>
> I'd appreciate it if you explicitly outlined what policies you feel
> should be enforce at each of the 3 junctures within an UPSERT (post
> INSERT, pre-UPDATE, post-UPDATE). I would also like you to be very
> explicit about whether or not RLS WITH CHECK policies and USING quals
> (presumably enforced as RLS WITH CHECK policies) from both INSERT and
> UPDATE policies should be enforced at each point. In particular,
> should I ever not treat RLS WCO and USING quals equivalently? (recall
> that we don't want to elide an UPDATE silently, which makes much less
> sense for UPSERT...I had assumed that whatever else, we'd always treat
> WCO and USING quals from UPDATE (and ALL) policies equivalently, but
> perhaps not).

OK, I'll try to explicitly outline how I think this ought to work:

For INSERTs and UPDATEs, there are 3 kinds of RLS quals that come into play:

1). insertCheckQuals - the logical OR of the quals from all INSERT
WITH CHECK policy clauses. These give the user permission to insert
into the table, provided that the new rows satisfy at least one of
these quals.

2). updateUsingQuals - the logical OR of the quals from all UPDATE
USING policy clauses. These give the user permission to update
existing rows in the table, where the existing rows satisfy at least
one of these quals.

3). updateCheckQuals - the logical OR of the quals from all UPDATE
WITH CHECK policy clauses. These give the user permission to update
the table, provided that the new rows satisfy at least one of these
quals.

In general these may all differ from one another.

If we go forward with the idea that RLS quals should be checked before
attempting to insert/update any data, as we do for regular permission
checks, then stand-alone INSERT and UPDATE commands would work
conceptually as follows:

INSERT:
1. Check NEW against insertCheckQuals (error if the result is not true)
2. Do the actual insert of NEW

UPDATE:
1. Check OLD against updateUsingQuals (skip rows that don't match)
2. Check NEW against updateCheckQuals (error if the result is not true)
3. Do the actual update (OLD -> NEW)

Of course that's an over-simplification. The updateUsingQuals are
actually merged with the user-supplied WHERE clause in a way dependent
on the presence of leaky functions, but conceptually it matches the
above description.

I think that there is universal agreement that an INSERT .. ON
CONFLICT UPDATE that follows the insert path ought to match the pure
INSERT case, and only check the insertCheckQuals. That just leaves the
question of what to do on the update path, where things get more
complicated because there are 3 tuples involved in the process:

1). t1 - the tuple originally proposed for insertion, but which could
not be inserted due to a conflict with an existing row in the table
(aka EXCLUDED).

2). t2 - the existing row in the table that prevented t1 from being
inserted (aka TARGET).

3). t3 - the final new row resulting from the update path. In general,
we allow this to be quite different from t1 and t2.

If we think of INSERT .. ON CONFLICT UPDATE as an INSERT and an UPDATE
tied together, then logically the following would happen if the update
path were taken:

INSERT .. ON CONFLICT UPDATE (update path):
1. Check t1 against insertCheckQuals (error if not true)
2. Detect that t1 conflicts with t2
3. Test user-supplied auxiliary WHERE clause (skip if not true)
4. Check t2 against updateUsingQuals (error if not true)
5. Check t3 against updateCheckQuals (error if not true)
6. Do the actual update (t2 -> t3)

Step (4) is the only point where this differs from an INSERT and an
UPDATE tied together, and only in the fact that it throws an error
rather than skipping the row to be updated if the user doesn't have
permission to do so. The important point is that this is consistent
with the rule that it gives the user permission to insert/update a
new/existing row if and only if they would have been allowed to do so
using a stand-alone INSERT/UPDATE command.

Note that there are 3 participating tuples, and each is checked
against precisely one of the 3 relevant policies. That is important,
as I will attempt to explain.

As it stands, your patch classifies WCOs by command type and combines
them in a way that results in 4 additional RLS checks being performed
(not really in this order):

5.1. Check t2 against insertCheckQuals (error if not true)
5.2. Check t2 against updateCheckQuals (error if not true)
5.3. Check t3 against insertCheckQuals (error if not true)
5.4. Check t3 against updateUsingQuals (error if not true)

Some of those additional checks really don't make any sense, e.g.,
(5.2) prevents you from updating an existing tuple if you wouldn't
have had permission to make an update that resulted in that tuple,
which just seems completely backwards.

But the real problem is that by applying multiple checks to the same
tuple, you're implicitly ANDing together the quals from different RLS
policy types and the problem with that is that it's possible for the
quals from different policy types to be completely incompatible (their
logical AND may be identically equivalent to false). So by doing these
additional checks it may be impossible for any tuple to ever satisfy
all the checks at the same time, making the command unusable. So the
general rule, as I alluded to above, should be that no 2 different
kinds of RLS quals should ever be applied to the same tuple.

The nub of the problem is that you're classifying WCOs by command
type, which is wrong because updateUsingQuals have different semantics
from updateCheckQuals, even though they both originate from the UPDATE
policy. If you factor in updatable SB views, which we might hope to
one day have support for INSERT .. ON CONFLICT UPDATE, there will be
additional WCOs arising from the view, which will have different
semantics again (the SQL spec says that they should apply after the
insert/update to ensure that the final result is visible in the view).
These are then another separate kind of WCO (that doesn't depend on
the command type). I think this will actually be quite
straightforward, as long as all the different kinds of WCO aren't
lumped together.

If you take a look at my patch to apply RLS checks before
insert/update, you'll see that I've added a WCOKind enum that allows
each of these kinds of WCOs to be treated differently and checked at
different stages in the executor. I anticipated that INSERT .. ON
CONFLICT UPDATE would extend that by adding a new kind of WCO for
updateUsingQuals checked as if they were WCOs, allowing them to be
checked at a specific point in the new code (step (4) above).

In all of this, I think we should try to keep things as simple as
possible, to give the end user a chance to understand it --- although
I'm not sure I've achieved that with my explanation :-)

Regards,
Dean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-04-17 08:30:56 Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0
Previous Message Michael Paquier 2015-04-17 07:23:38 Re: Moving on to close the current CF 2015-02