Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Date: 2014-01-19 03:49:50
Message-ID: CAM3SWZTwwQqSA3Oqes3niNJ_zHQ+Si_8ZUgipL5ho1LL+3UJSQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jan 18, 2014 at 6:17 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> MySQL users are not notified that this happened, and are probably
> blissfully unaware that there has been a limited form of data loss. So
> it's The Right Thing to say to Postgres users: "if you inserted these
> rows into the table when it was empty, there'd *still* definitely be a
> unique constraint violation, and you need to sort that out before
> asking Postgres to handle conflicts with concurrent sessions and
> existing data, where rows that come from earlier commands in your xact
> counts as existing data".

I Googled and found evidence indicating that a number of popular
proprietary system's SQL MERGE implementations do much the same thing.
You may get an "attempt to UPDATE the same row twice" error on both
SQL Server and Oracle. I wouldn't like to speculate if the standard
requires this of MERGE, but to require it seems very sensible.

> The only problem I can see with that is that
> we cannot complain consistently for practical reasons, as when we lock
> *some other* xact's tuple rather than inserting in the same command
> two or more times.

Actually, maybe it would be practical to complain that the same UPSERT
command attempted to lock a row twice with at least *almost* total
accuracy, and not just for the particularly problematic case where
tuple visibility is not assured.

Personally, I favor just making "case HeapTupleSelfUpdated:" within
the patch's ExecLockHeapTupleForUpdateSpec() function complain when
"hufd.cmax == estate->es_output_cid)" (currently there is a separate
complaint, but only when those two variables are unequal). That's
probably almost perfect in practice.

If we wanted perfection, which would be to always complain when two
rows were locked by the same UPSERT command, it would be a matter of
having heap_lock_tuple indicate to the patch's
ExecLockHeapTupleForUpdateSpec() caller that the row was already
locked, so that it could complain in a special way for the
locked-not-updated case. But that is hard, because there is no way for
it to know if the current *command* locked the tuple, and that's the
only case that we are justified in raising an error for.

But now that I think about it some more, maybe always complaining when
we lock but have not yet updated is not just not worth the trouble,
but is in fact bogus. It's not obvious what precise behavior is
correct here. I was worried about someone updating something twice,
but maybe it's fully sufficient to do what I've already proposed,
while in addition documenting that you cannot on-duplicate-key-lock a
tuple that has already been inserted or updated within the same
command. It will be very rare for anyone to trip up over that in
practice (e.g. by locking twice and spuriously updating the same row
twice or more in a later command). Users learn to not try this kind of
thing by having it break immediately; the fact that it doesn't break
with 100% reliability is good enough (plus it doesn't *really* fail to
break when it should because of how things are documented).

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2014-01-19 04:27:21 Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Previous Message Peter Geoghegan 2014-01-19 02:17:41 Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE