Re: MERGE Specification

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Decibel! <decibel(at)decibel(dot)org>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Gregory Stark <stark(at)enterprisedb(dot)com>, "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: MERGE Specification
Date: 2008-04-24 19:28:19
Message-ID: 1209065299.4259.1641.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 2008-04-24 at 12:19 -0400, Tom Lane wrote:
> Decibel! <decibel(at)decibel(dot)org> writes:
> > That really strikes me as taking the "MySQL route". If push comes to
> > shove, I'll take a MERGE with race conditions over no merge at all,
> > but I think it's very important that it does the right thing. Just
> > because the spec doesn't say anything about it doesn't mean it's ok.
>
> Agreed. It seems to me that in the last set of discussions, we rejected
> implementing MERGE precisely because it failed to provide a solution to
> the race-condition problem. I'm not satisfied with a version that
> doesn't handle that, because I think that is *exactly* what most people
> will try to use it for. The non-concurrent bulk update case that Simon
> is arguing for is the uncommon usage.

If y'all think that, then I will do it that way.

The only protection from the race condition is to do the Insert first.

With MERGE, we would need to do it like this:

1. If there are any WHEN NOT MATCHED clauses that trigger INSERTs, then
attempt to apply them first, no matter what order they were in with
respect to the WHEN MATCHED clauses. Start loop at step (3) every time.
If there aren't any, start loop straight at step (5). Note that we would
need to check to see that INSERTs had not been removed by Rules.

2. For each row retrieved by outer join, goto either step 3 or 5 as
established before the loop starts.

3. Try to apply the WHEN NOT MATCHED clauses. The ordering of the
clauses with respect to each other will remain exactly as stated. If one
of the clauses activates an INSERT, we start an internal subtransaction
and perform the INSERT action. If it succeeds, we commit the
subtransaction and continue.

4. If the INSERT fails with a uniqueness violation, we shrug. The ERROR
has caused the subtransaction to abort.

5. Process WHEN MATCHED clauses and continue.

Technically, this is a standards violation because of the potentially
out-of-order execution of the when clauses. Though the end result is not
distinguishable from standards compliant behaviour, AFAICS.

Note that in step 3 we *must* use subtransactions if there is more than
1 unique index on a table, otherwise we might succeed with the first
index and fail with the second. Using a subtransaction per row pretty
much rules out an efficient bulk load.

Note also that this results in a version optimised for INSERT. If we end
up doing an UPDATE there will be two dead rows, probably in two separate
blocks. We hope that doesn't matter because of HOT.

There's probably a reasonable argument for having an optional keyword to
make MERGE behave differently for bulk loads, but I'll save that now for
another day. Focus now is on a command that works well for OLTP cases.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2008-04-24 19:52:10 Re: Proposed patch - psql wraps at window width
Previous Message Bruce Momjian 2008-04-24 19:22:16 Re: Proposed patch - psql wraps at window width