Re: someone working to add merge?

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Josh Berkus <josh(at)agliodbs(dot)com>, Jaime Casanova <systemguards(at)gmail(dot)com>
Subject: Re: someone working to add merge?
Date: 2005-11-25 12:14:03
Message-ID: 20051125121402.GB16970@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 24, 2005 at 11:11:34AM -0500, Jan Wieck wrote:
> On 11/24/2005 1:30 AM, Martijn van Oosterhout wrote:
> >Umm, if there are any errors you abort the transaction, just like any
> >other case. ACID requires that either the whole statement is done, or
> >none. If a trigger causes the INSERT or UPDATE to fail you have no
> >choice but to abort the transaction.
>
> I guess you misunderstood. What I am talking about is a problem in the
> order of execution. since we don't have predicate locking, there is a
> possibility that our implementation of MERGE decides to do an INSERT
> while another transaction does the same. What has to happen is that the
> BEFORE INSERT trigger is called, then the heap tuple inserted, then the
> index tuples created. At this time, the duplicate key error occurs,
> telling us that we had a conflict and that we have to try an UPDATE
> instead. That means, in the end this particular row's INSERT has never
> happened and we have to undo the BEFORE INSERT triggers actions too.

But I'm not sure we're supposed to handle that case anyway. Oracle at
least doesn't require an index on the table being merged. And if I look
at it from a visibility view point, if someone else does an INSERT in
another transaction, then MERGE cannot see it and thus it will INSERT
too. This isn't an error.

Consider the case of a deferred unique constraint (Postgres doesn't
support these yet but bear with me). Say in one session you start a
transaction, do a MERGE and then a few other statements. In the
meantime in another session someone inserts a row on the table you
merged. Are you asserting that the first session should undo
everything, do an UPDATE instead of an INSERT and redo all your queries
since? Obviously not.

Though the above relies on something Postgres doesn't support, but you
would be able to emulate the some without a unique key. For example:

Session 1:
CREATE TEMP TABLE foo (id integer, val integer);

BEGIN;
SELECT * FROM foo;

Session 2:
BEGIN;

INSERT INTO foo (1,3);

Session 1:
MERGE INTO foo
USING (SELECT 1)
ON (foo.id = 1)
WHEN MATCHED THEN UPDATE SET val = 1
WHEN NOT MATCHED THEN INSERT (id,val) VALUES (1,2);

Session 2:
COMMIT;

Session 1:
COMMIT;

Now, (IMO) in serializable mode, the MERGE should block on reading the
row inserted by the second session and when that commits do the UPDATE,
thus leaving you with a table with one row (1.1).

In read committed mode, the MERGE shouldn't block and you should end up
with a table with two rows (1,3) and (1,2).

If you switch the order of the insert and merge you should get the same
results in both cases, (1,2) and (1,3).

I think you are arguing that the result should be (1,1) in all cases. I
honestly don't see how that is feasible and certainly not supported by
my reading of the standard. I would be interested to know how other
databases handle the above case.

> Not following the semantics is an error. MERGE is not supposed to do
> multiple inserts for the same match, concurrency or not.

Yes, any single MERGE cannot insert twice. However, two MERGEs running
concurrently, or a MERGE with an INSERT/UPDATE/DELETE in another
session could very well end up with multiple rows on the same key. I
maintain that MERGE has no special rules w.r.t. visibility or locking
and we should not be acting as if it does. If at the end of the
transaction there a duplicate key we should throw the error and let the
client deal with it.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2005-11-25 13:48:16 Re: PL/php in pg_pltemplate
Previous Message Simon Riggs 2005-11-25 10:23:54 Re: gprof SELECT COUNT(*) results