Re: Re: new patch of MERGE (merge_204) & a question about duplicated ctid

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: new patch of MERGE (merge_204) & a question about duplicated ctid
Date: 2011-01-03 17:50:39
Message-ID: AANLkTimQ01TYQBCA1kPF=uMQEpdrW5v2u1tN4+HwpM7b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 3, 2011 at 12:01 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> If we do that, then we definitely need a catch-all WHEN statement, so
>> that we can say
>>
>> WHEN NOT MATCHED
>>   INSERT
>> WHEN MATCHED
>>   UPDATE
>> ELSE
>>   { INSERT into another table so we can try again in a minute
>>  or RAISE error }
>>
>> Otherwise we will silently drop rows. Throwing an error every time isn't
>> useful behaviour.
>
> An ELSE clause would be nice, but it's not related to the question at hand.
> Only some serialization anomalities result in a row that matches neither
> WHEN MATCHED nor WHEN NOT MATCHED. Others result in a duplicate key
> exception, for example.

I must be missing something. A source row is either matched or not
matched. ELSE doesn't exist because the writers of the spec thought
there might be some third matched-invisible-row case, but rather
because you might have written WHEN [NOT MATCHED] AND <some qual>, and
you might fall through a list of all such clauses.

I think we're focusing on the wrong problem here. MERGE creates some
syntax to let you do with SQL something that people are currently
doing with application-side logic. I've written the application-side
logic to do this kind of thing more times than I care to remember, and
yeah there are concurrency issues, but:

- sometimes there's only one writer, so it doesn't matter
- sometimes there can technically be more than one writer, but the
usage is so low that nothing actually breaks
- sometimes you know that a given writer will only operate on rows
customer_id = <some constant>; so you only need to prevent two
concurrent writers *for the same customer*, not any two concurrent
writers
- and sometimes you need a full table lock.

The third case, in particular, is quite common in my experience, and a
very good reason not to impose a full table lock. Users hate having
to do explicit locking (especially users whose names rhyme with Bevin
Bittner) but they hate *unnecessary* full-table locks even more. A
problem that you can fix by adding a LOCK TABLE statement is annoying;
a problem that you can fix only be removing an implicit lock table
operation that the system performs under the hood is a lot worse. In
the fourth case above, which IME is quite common, you could EITHER
take a full-table lock, if that performs OK, OR you could arrange to
take an advisory lock that protects the records for the particular
customer whose data you want to update. If we always take a
full-table lock, then the user loses the option to do something else.

The point we ought to be focusing on is that the patch doesn't work.
Unless someone is prepared to put in some time to fix THAT, the rest
of this discussion is academic.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2011-01-03 18:08:17 Re: back branches vs. VS 2008
Previous Message Simon Riggs 2011-01-03 17:48:57 Re: Re: new patch of MERGE (merge_204) & a question about duplicated ctid