Re: MERGE vs REPLACE

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: josh(at)agliodbs(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Petr Jelinek <pjmodos(at)seznam(dot)cz>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, jd(at)commandprompt(dot)com
Subject: Re: MERGE vs REPLACE
Date: 2005-11-15 22:11:34
Message-ID: 1132092694.3388.82.camel@holly
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2005-11-15 at 10:27 -0800, Josh Berkus wrote:

> > The UPSERT concept is also supported by Teradata, who simply append an
> > ELSE INSERT clause onto the standard UPDATE syntax. MySQL REPLACE seems
> > to me to be a fairly small subset of MERGE functionality and we ought to
> > be able to offer that functionality as a side branch of the main work.
>
> Yes, I guess my hesitation on the full-table-lock strategy is that it
> doesn't really fulfill the mandate for why people want REPLACE-like
> statements ... to give them an INSERT-or-UPDATE with *higher* efficiency
> and concurrency than doing two statements.

Agreed...

OK, what I said was "...Where there is doubt, we should fall back to
table locking just like the rest of the world, IMHO." I didn't mean we
should use full table locking all of the time. Sorry if I wasn't clear.

What I meant, in context was, IMHO
- we do no need predicate locking
- we should use row level locks when these can be used
- we should use table level locks other times

On Fri, 2005-11-11 at 14:40 -0800, Josh Berkus wrote:
> Scenario:
>
> session1: REPLACE .... 1
> session2: REPLACE ..... 1
> session1: check to see that "1" exists .... no
> session2: check to see that "1" exists .... no
> session1: INSERT 1
> session2: INSERT 1 .... ERROR
>
> Get the picture? The only way to avoid a race condition is to be able to
> do "predicate locking", that is to lock the table against any data write
> matching that predicate.

The above situation happens now if you have two people doing SELECT then
INSERT on the same row. Or even if you do UPDATE then INSERT.

Currently if two people INSERT a row with a duplicate PK, we may find
that both INSERTs made it into a data block, but one has its txn aborted
because of a unique index violation.

Nobody complains about that, nor says we should have predicate locking
for that case, so why worry about it for MERGE? I don't see we need a
special effort to avoid: if you try to do the same thing more than once,
only the first one will work. At least you get a nice error message to
let you know its happened.

Seems like we
- run the query in the USING clause
- join it to the target table using pseudo outer join logic
- if it matches we UPDATE
- if it doesn't we INSERT
...but we do the last two in a single step, to minimise the window of
opportunity for wierd situations.

> That being said, I've
> personally designed more than a dozen web applications and have not yet
> been faced with a single circumstance of not knowing whether I wanted to
> INSERT or UPDATE. I've even ported MySQL apps and found it easy to
> re-code them to do "if $id = 0, then insert ..." without even needing to
> use a pl/pgsql hack.

OK, but not everybody knows what they're doing as well as you do. :-)

> So we thus have two seperate use cases. The first, for bulk loading/ETL is
> what MERGE fulfills rather neatly and for that full table locking is
> perfectly OK, even desirable. You really don't want to MERGE-load the
> same table on two threads at once.

Probably true, but I do want to avoid full table locking for MERGE
whenever possible. Concurrency is important even in data warehousing.

Best Regards, Simon Riggs

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2005-11-15 22:36:33 Re: [HACKERS] Per-table freeze limit proposal
Previous Message Bruce Momjian 2005-11-15 21:58:48 Re: server closed connection on a select query