Re: ask for review of MERGE

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>
Subject: Re: ask for review of MERGE
Date: 2010-10-25 19:40:07
Message-ID: AANLkTikF4JH9=xu51XLBeWDPqXYpRLn1YFt6gvQBVCqf@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 25, 2010 at 3:15 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
>> rhaas=# create table concurrent (x integer primary key);
>> NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
>> "concurrent_pkey" for table "concurrent"
>> CREATE TABLE
>> rhaas=# insert into x values (1);
>> rhaas=# begin;
>> BEGIN
>> rhaas=# insert into concurrent values (2);
>> INSERT 0 1
>>
>> <switch to a different window>
>>
>> rhaas=# update concurrent set x=x where x=2;
>> UPDATE 0
>
> That surprised me.  I would have thought that the INSERT would have
> created an "in doubt" tuple which would block the UPDATE.  What is
> the reason for not doing so?

This is just standard MVCC - readers don't block writers, nor writers
readers. You might also think about what would happen if the UPDATE
were run before the INSERT of (2). There's no serialization anomaly
here, because either concurrent case is equivalent to the serial
schedule where the update precedes the insert.

In the case of a MERGE that matches a just-inserted invisible tuple
but no visible tuple, things are a bit stickier. Let's suppose we're
trying to use MERGE to get UPSERT semantics. If the MERGE command has
the obvious behavior of ignoring the invisible tuple just as UPDATE or
DELETE would do, then clearly any equivalent serial schedule must run
the MERGE before the INSERT (because if it were run after the INSERT,
it would fire the MATCHED action rather than the NOT MATCHED action).
But if the merge were run before the INSERT, then the INSERT would
have failed with a unique key violation; instead, the merge fails with
a unique key violation. On the other hand, if the MERGE sees the
invisible tuple, essentially using SnapshotNow semantics, as Greg
Stark proposed, you get a different (and probably worse) class of
serialization anomalies. For example, suppose the table has rows
1-100 and you do an update adding 1000 to each value concurrently with
merging in the values 51-100. You might get something like this:

- MERGE scans rows 1-75, firing MATCHED for rows 51-75.
- UPDATE commits.
- MERGE scans rows 76-100, firing NOT MATCHED for each.

Now, as Greg says, that might be what some people want, but it's
certainly monumentally unserializable. In a serial execution
schedule, the MERGE will either run before the UPDATE, in which case
MATCHED will fire for rows 51-100, or else the UPDATE will run before
the MERGE, in which case NOT MATCHED will fire for rows 51-100. No
serial schedule is going to fire MATCHED for some rows and NOT MATCHED
for others.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-10-25 19:43:05 Re: foreign keys for array/period contains relationships
Previous Message Greg Stark 2010-10-25 19:34:45 Re: foreign keys for array/period contains relationships