Re: INSERT ... ON CONFLICT () SELECT

From: Matt Pulver <mpulver(at)unitytechgroup(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: INSERT ... ON CONFLICT () SELECT
Date: 2017-06-18 11:33:51
Message-ID: CAHiCE4XHu=7EoupTTqVT+XPQDweKAK1-+Wt2AuSp-AXnKSr8eA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jun 17, 2017 at 9:55 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:

> On Sat, Jun 17, 2017 at 7:49 AM, Matt Pulver <mpulver(at)unitytechgroup(dot)com>
> wrote:
> > With the proposed "INSERT ... ON CONFLICT () SELECT" feature, the
> > get_or_create_id() function is simplified to:
>
> Are you locking the existing rows? Because otherwise, the
> determination that they're conflicting can become obsolete immediately
> afterwards. (I guess you would be.)
>

If that is required in order to return the rows in their conflicted state,
then yes.

> The problem with this design and similar designs is that presumably
> the user is sometimes projecting the conflicting rows with the
> intention of separately updating them in a wCTE. That might not work,
> because only ON CONFLICT doesn't use the MVCC snapshot, in order to
> ensure that an UPDATE is guaranteed when an INSERT cannot go ahead.
> That isn't what you're doing in the example you gave, but surely some
> users would try to do things like that, and get very confused.
>

Ultimately the proposed "INSERT ... ON CONFLICT () DO SELECT" syntax is
still an INSERT statement, not a SELECT, so a user should not expect rows
returned from it to be available for UPDATE/DELETE in another part of a
wCTE. Anyone who understands this behavior for an INSERT statement, let
alone the current "INSERT ... ON CONFLICT DO UPDATE" should not be too
surprised if the same thing applies to the new "INSERT ... ON CONFLICT DO
SELECT".

> I think that what you propose to do here would likely create a lot of
> confusion by mixing MVCC semantics with special UPSERT visibility
> semantics ("show me the latest row version visible to any possible
> snapshot for the special update") even without a separate UPDATE, in
> fact. Would you be okay if "id" appeared duplicated in the rows you
> project in your new syntax, even when there is a separate unique
> constraint on that column? I suppose that there is some risk of things
> like that today, but this would make the "sleight of hand" used by ON
> CONFLICT DO UPDATE more likely to cause problems.
>

Good point. Here is an example using the example table from my previous
email:

INSERT INTO example (name) VALUES ('foo'), ('foo')
ON CONFLICT (name) DO SELECT
RETURNING *

Here are a couple options of how to handle this:

1) Return two identical rows (with the same id).
2) Produce an error, with error message:
"ERROR: ON CONFLICT DO SELECT command cannot reference row a second time
HINT: Ensure that no rows proposed for insertion within the same command
have duplicate constrained values."

This would be nearly identical to the existing error message that is
produced when running:

INSERT INTO example (name) VALUES ('foo'), ('foo')
ON CONFLICT (name) DO UPDATE SET value=1
RETURNING *

which gives the error message:
"ERROR: ON CONFLICT DO UPDATE command cannot affect row a second time
HINT: Ensure that no rows proposed for insertion within the same command
have duplicate constrained values."

Technically, an error doesn't *need* to be produced for the above "ON
CONFLICT DO SELECT" statement - I think it is still perfectly well-defined
to return duplicate rows as in option 1. Option 2 is more for the benefit
of the user who is probably doing something wrong by attempting to INSERT a
set of rows that violate a constraint. What do you think would be best?

Thank you for the discussion.

Best regards,
Matt

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shubham Barai 2017-06-18 11:48:23 Re: GSoC 2017 : Patch for predicate locking in Gist index
Previous Message Julien Rouhaud 2017-06-18 07:16:07 Typo in insert.sgml