Re: [HACKERS] MERGE SQL Statement for PG11

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-01 12:33:11
Message-ID: CABOikdNnJcrEbbZ6+9PGP=czFOi7Wz46T0mK8Cr0KHngoWJT6w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 1, 2018 at 3:50 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:

> On Fri, Feb 9, 2018 at 6:36 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > Here's my $0.02: I think that new concurrency errors thrown by the
> > merge code itself deserve strict scrutiny and can survive only if they
> > have a compelling justification, but if the merge code happens to
> > choose an action that leads to a concurrency error of its own, I think
> > that deserves only mild scrutiny.
> >
> > On that basis, of the options I listed in
> > http://postgr.es/m/CA+TgmoZDL-caukHkWet7sr7sqr0-e2T91+
> DEvhqeN5sfqsMjqw(at)mail(dot)gmail(dot)com
> > I like #1 least.
> >
> > I also dislike #4 from that list for the reasons stated there. For
> > example, if you say WHEN MATCHED AND x.some_boolean and then WHEN
> > MATCHED, you expect that every tuple that hits the latter clause will
> > have that Boolean as false or null, but #4 makes that not true.
> >
> > I think the best options are #2 and #5 -- #2 because it's simple, and
> > #5 because it's (maybe) more intuitive, albeit at the risk of
> > livelock. But I'm willing to convinced of some other option; like
> > you, I'm willing to be open-minded about this. But, as you say, we
> > need a coherent, well-considered justification for the selected
> > option, not just "well, this is what we implemented so you should like
> > it". The specification should define the implementation, not the
> > reverse.
>
> At first I hated option #2. I'm warming to #2 a lot now, though,
> because I've come to understand the patch's approach a little better.
> (Pavan and Simon should still verify that I got things right in my
> mail from earlier today, though.)
>
> It now seems like the patch throws a RC serialization error more or
> less only due to concurrent deletions (rarely, it will actually be a
> concurrent update that changed the join qual values of our MERGE).
> We're already not throwing the error (we just move on to the next
> input row from the join) when we happen to not have a WHEN NOT MATCHED
> case. But why even make that distinction? Why not just go ahead and
> WHEN NOT MATCHED ... INSERT when the MERGE happens to have such a
> clause? The INSERT will succeed, barring any concurrent conflicting
> insertion by a third session -- a hazard that has nothing to do with
> RC conflict handling in particular.
>
> Just inserting without throwing a RC serialization error is almost
> equivalent to a new MVCC snapshot being acquired due to a RC conflict,
> so all of this now looks okay to me. Especially because every other
> MERGE implementation seems to have serious issues with doing anything
> too fancy with the MERGE target table's quals within the main ON join.
> I think that SQL Server actually assumes that you're using the
> target's PK in a simple equi-join. All the examples look like that,
> and this assumption is heavily suggested by the "Do not attempt to
> improve query performance by filtering out rows in the target table in
> the ON clause" weasel words from their docs, that I referenced in my
> mail from earlier today.
>
>
I think you've fairly accurately described what the patch does today. I
take your point that we can very well just execute the WHEN NOT MATCHED
action if the join condition fails for the updated tuple. There is one case
we ought to think about though and that might explain why executing the
WHEN NOT MATCHED action may not be best choice. Or for that matter even
skipping the target when no NOT MATCHED action exists, as the patch does
today.

What if the updated tuple fails the join qual with respect to the current
tuple from the source relation but it now matches some other tuple from the
source relation? I described this case in one of the earlier emails too. In
this case, we might end up doing an INSERT (if we decide to execute WHEN
NOT MATCHED action), even though a MATCH exists. If there is no WHEN NOT
MATCHED action, the current patch will just skip the updated tuple even
though a match exists, albeit it's not the current source tuple.

Oracle behaves differently and it actually finds a new matching tuple from
the source relation and executes the WHEN MATCHED action, using that source
tuple. But I am seriously doubtful that we want to go down that path and
whether it's even feasible. Our regular UPDATE .. FROM does not do that
either. Given that, it seems better to just throw an error (even when no
NOT MATCHED action exists) and explain to the users that MERGE will work as
long as concurrent updates don't modify the columns used in the join
condition. Concurrent deletes should be fine and we may actually even
invoke WHEN NOT MATCHED action in that case.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2018-03-01 12:40:01 Re: inserts into partitioned table may cause crash
Previous Message Alexander Korotkov 2018-03-01 12:23:31 Re: [HACKERS] [PATCH] Incremental sort