Re: ask for review of MERGE

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Greg Smith <greg(at)2ndquadrant(dot)com>, Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>
Subject: Re: ask for review of MERGE
Date: 2010-10-25 20:58:15
Message-ID: AANLkTikdP6c_Mw_1mZUtSzidXyWt2aT_5HL5scU=KHCq@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 25, 2010 at 4:10 PM, Greg Stark <gsstark(at)mit(dot)edu> wrote:
> On Mon, Oct 25, 2010 at 12:40 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> Now, as Greg says, that might be what some people want, but it's
>> certainly monumentally unserializable.
>
> To be clear when I said it's what people want what I meant was that in
> the common cases it's doing exactly what people want. As opposed to
> getting closer to what people want in general but not quite hitting
> the mark in the common cases.
>
> Just as an example I think it's important that in the simplest case,
> upsert of a single record, it be 100% guaranteed to do the naive
> upsert. If two users are doing the merge of a single key at the same
> time one of them had better insert and one of them had better update
> or else users are going to be monumentally surprised.

Hmm, so let's think about that case.

The first merge comes along and finds no match so it fires the NOT
MATCHED rule, which inserts a tuple. The second merge comes along and
finds no match, so it also fires the NOT MATCHED rule and tries to
insert a tuple. But upon consulting the PRIMARY KEY index it finds
that an in-doubt tuple exists so it goes to sleep waiting for the
first transaction to commit or abort. If the first transaction
commits it then decides that the jig is up and fails. We could
(maybe) fix this by doing something similar to what EPQ does for
updates: when the first transaction commits, instead of redoing the
insert, we back up and recheck whether the new tuple would have
matched the join clause and, if so, we instead fire the MATCHED action
on the updated tuple. If not, we fire NOT MATCHED anyway. I'm not
sure how hard that would be, or whether it would introduce any other
nasty anomalies in more complex cases.

Alternatively, we could introduce an UPSERT or REPLACE statement
intended to handle exactly this case and leave MERGE for more complex
situations. It's pretty easy to imagine what the coding of that
should look like: if we encounter an in-doubt tuple in we wait on its
xmin. If the transaction aborts, we insert. If it commits, and we're
in READ COMMITTED mode, we update it; but if we're in REPEATABLE READ
or SERIALIZABLE mode, we abort with a serialization error. That's a
lot simpler to understand and reason about than MERGE in its full
generality.

I think it's pretty much hopeless to think that MERGE is going to work
in complex concurrent scenarios without creating serialization
anomalies, or at least rollbacks. I think that's baked into the
nature of what the statement does. To simulate MERGE, you need to
read from the target table and then do writes that depend on what you
read. If you do that with the commands that are available today,
you're going to get serialization anomalies and/or rollbacks under
concurrency. The mere fact of that logic being inside the database
rather than outside isn't going to make that go away. Now sometimes,
as with exclusion constraints, you can play games with dirty snapshots
to get the semantics you want, but whether that's possible in a
particular case depends on the details of the operation being
performed, and here I think it can't be done. Some operations are
*fundamentally* unserializable.

A very simple example of this is a sequence that is guaranteed not to
have gaps (a feature we've occasionally been requested to provide).
If N processes request a sequence number simultaneously, you have to
hand out a value to the first guy and wait and see whether he commits
or aborts before deciding what number to give the second guy. That
sucks, so usually we just design our applications not to require that
sequences be gap-free. Similarly here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-10-25 21:01:08 Re: Floating-point timestamps versus Range Types
Previous Message Merlin Moncure 2010-10-25 20:51:02 Re: Postgres insert performance and storage requirement compared to Oracle