Re: UPSERT

From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Jonathan Scher" <js(at)oxado(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: UPSERT
Date: 2007-03-02 18:00:09
Message-ID: 1172858409.3760.1618.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2007-03-02 at 15:41 +0000, Heikki Linnakangas wrote:
> Tom Lane wrote:
> > Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> >> My instinct would be to follow your first strategy, i.e. detect which
> >> path is needed rather than try one and then if it fails do the other.
> >
> > The very first thing you need to think about is how to solve the race
> > condition problem, ie, two backends concurrently trying to insert
> > identical data. Until you have a plausible mechanism for that, the
> > whole thing is pie-in-the-sky.
>
> How about:
>
> 1. Insert new heap tuple
> 2. Try to insert the index tuple. If there's a duplicate tuple, lock the
> existing tuple instead of throwing an error.
> 3. If there was no duplicate, we're done.
>
> 4. Otherwise, kill the new tuple inserted in step 1, by setting it's
> xmin to InvalidTransactionId.
> 5. Perform the UPDATE on the existing tuple.
>
> This requires one change to the indexam api: a duplicate key violation
> needs to lock the existing tuple instead of throwing an error.

So if the INSERT fails we will leave two dead copies of the tuples? Hmm.

Seems like we should try to locate a row first, then INSERT if we cannot
find one. That's slower on INSERT but more balanced overall - sometimes
the input will generate all UPDATEs, sometimes all INSERTs we'll never
know.

I'm a bit surprised the TODO didn't mention the MERGE statement, which
is the SQL:2003 syntax for specifying this as an atomic statement. There
are lots of other syntaxes, the most simple of which are the MySQL
REPLACE and Teradata's UPDATE ... ELSE INSERT. As seductive as they are,
I'd say that's all the more reason to go with the single approved
syntax. If MySQL are standards compliant, they will support that also,
so we get MySQL compatibility either way.

Another thought that really ought to be on the TODO is a MERGE FROM
(pick your syntax) that allows MERGE to act like a COPY, reading data
from an external data file. That would save effort, since the only way
of doing this currently is to do a COPY then an UPDATE and then an
INSERT. So the MERGE FROM would reduce three operations to just a single
command.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

In response to

  • Re: UPSERT at 2007-03-02 15:41:56 from Heikki Linnakangas

Responses

  • Re: UPSERT at 2007-03-02 18:19:58 from Tom Lane
  • Re: UPSERT at 2007-03-02 18:20:20 from Bricklen Anderson

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-03-02 18:19:58 Re: UPSERT
Previous Message Bruce Momjian 2007-03-02 17:52:51 Patch license update to developer's FAQ