Re: MERGE vs REPLACE

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Zeugswetter Andreas DCP SD <ZeugswetterA(at)spardat(dot)at>, Dann Corbit <DCorbit(at)connx(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Rick Gigger <rick(at)alpinenetworking(dot)com>, Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org, Jaime Casanova <systemguards(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: MERGE vs REPLACE
Date: 2005-11-17 15:15:30
Message-ID: 20051117151530.GU6026@ns.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> This is exactly the point --- pretty much nobody has come to us and
> asked for a feature that does what Peter and Martijn say MERGE does.
> (I haven't bothered to look at the 2003 spec, I'm assuming they read it
> correctly.) What we *have* been asked for, over and over, is an
> insert-or-update feature that's not so tedious and inefficient as the
> savepoint-insert-rollback-update kluge. That's what we ought to be
> concentrating on providing.

I guess to be clear on what this distinction actually is, specifically:
MERGE under SQL2003 doesn't appear to be intended to be used
concurrently. For data warehousing situations this can be just fine
such as in my case where I get a monthly update of some information and
need to merge that update in with the prior information. In this case
there's only one MERGE running and I'd hope it'd be faster than doing
check for existance, insert/update on each row in plpgsql or something
(since there'd be multiple index lookups, etc, I think). Concurrent
MERGEs running *can* fail, just like whole transactions which do the
check/insert/update can fail.

REPLACE/INSERT ON DUPLICATE UPDATE appears to essentially be a
transaction which is supposed to not fail but instead do locking to
ensure that it doesn't fail. This requires predicate locking to be
efficient because you want to tell the concurrent transaction "if you
have the same key as me, just wait a second and you can do an update
'cause I'm going to create the key if it doesn't exist before I'm done".

I think REPLACE/INSERT ON DUPLICATE UPDATE is definitely harder to do
than MERGE because of the idea that it isn't supposed to fail generally.
I think SQL2003 MERGE would be reasonably easy to do and to get the
efficiency benefits out of it (assuming there are some to be had in the
end).

I don't think MERGE can really be made to be both though, in which case
it should really be the SQL2003 MERGE and we can make REPLACE/INSERT ON
DUPLICATE UPDATE something else. Perhaps a special form of MERGE where
you know it's going to be doing that locking. I really don't like the
idea of making the SQL2003 version of MERGE be the MERGE special case
(by requiring someone to take a table lock ahead of time or do something
else odd).

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2005-11-17 15:25:26 CLUSTER and clustered indices
Previous Message Joshua D. Drake 2005-11-17 15:08:34 Re: Optional postgres database not so optional in 8.1