Skip site navigation (1) Skip section navigation (2)

Re: UPSERT

From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>,"Andrew Dunstan" <andrew(at)dunslane(dot)net>,"Jonathan Scher" <js(at)oxado(dot)com>,<pgsql-hackers(at)postgresql(dot)org>
Subject: Re: UPSERT
Date: 2007-03-02 18:00:09
Message-ID: 1172858409.3760.1618.camel@silverbirch.site (view raw or flat)
Thread:
Lists: pgsql-hackers
On Fri, 2007-03-02 at 15:41 +0000, Heikki Linnakangas wrote:
> Tom Lane wrote:
> > Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> >> My instinct would be to follow your first strategy, i.e. detect which 
> >> path is needed rather than try one and then if it fails do the other.
> > 
> > The very first thing you need to think about is how to solve the race
> > condition problem, ie, two backends concurrently trying to insert
> > identical data.  Until you have a plausible mechanism for that, the
> > whole thing is pie-in-the-sky.
> 
> How about:
> 
> 1. Insert new heap tuple
> 2. Try to insert the index tuple. If there's a duplicate tuple, lock the 
> existing tuple instead of throwing an error.
> 3. If there was no duplicate, we're done.
> 
> 4. Otherwise, kill the new tuple inserted in step 1, by setting it's 
> xmin to InvalidTransactionId.
> 5. Perform the UPDATE on the existing tuple.
> 
> This requires one change to the indexam api: a duplicate key violation 
> needs to lock the existing tuple instead of throwing an error.

So if the INSERT fails we will leave two dead copies of the tuples? Hmm.

Seems like we should try to locate a row first, then INSERT if we cannot
find one. That's slower on INSERT but more balanced overall - sometimes
the input will generate all UPDATEs, sometimes all INSERTs we'll never
know.


I'm a bit surprised the TODO didn't mention the MERGE statement, which
is the SQL:2003 syntax for specifying this as an atomic statement. There
are lots of other syntaxes, the most simple of which are the MySQL
REPLACE and Teradata's UPDATE ... ELSE INSERT. As seductive as they are,
I'd say that's all the more reason to go with the single approved
syntax. If MySQL are standards compliant, they will support that also,
so we get MySQL compatibility either way.

Another thought that really ought to be on the TODO is a MERGE FROM
(pick your syntax) that allows MERGE to act like a COPY, reading data
from an external data file. That would save effort, since the only way
of doing this currently is to do a COPY then an UPDATE and then an
INSERT. So the MERGE FROM would reduce three operations to just a single
command. 

-- 
  Simon Riggs             
  EnterpriseDB   http://www.enterprisedb.com



In response to

  • Re: UPSERT at 2007-03-02 15:41:56 from Heikki Linnakangas

Responses

  • Re: UPSERT at 2007-03-02 18:19:58 from Tom Lane
  • Re: UPSERT at 2007-03-02 18:20:20 from Bricklen Anderson

pgsql-hackers by date

Next:From: Tom LaneDate: 2007-03-02 18:19:58
Subject: Re: UPSERT
Previous:From: Bruce MomjianDate: 2007-03-02 17:52:51
Subject: Patch license update to developer's FAQ

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group