Re: [HACKERS] MERGE SQL Statement for PG11

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-01-24 04:12:16
Message-ID: CANP8+jKTuZ5NDCLkqXBm9kMeaT-jNxxhkMg+FOpLni=pwOQ=BA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 24 January 2018 at 01:35, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Tue, Jan 23, 2018 at 5:51 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> Not yet working
>>> * Partitioning
>>> * RLS
>>>
>>> Based on this successful progress I imagine I'll be looking to commit
>>> this by the end of the CF, allowing us 2 further months to bugfix.
>>
>> This is complete and pretty clean now. 1200 lines of code, plus docs and tests.
>
> That timeline seems aggressive to me.

Thank you for the review.

> Also, the patch appears to have
> bitrot. Please rebase, and post a new version.

Will do, though I'm sure that's only minor since we rebased only a few days ago.

> Some feedback based on a short read-through of your v11:
>
> * What's the postgres_fdw status?

MERGE currently works with normal relations and materialized views only.

> * Relatedly, when/where does applying the junkfilter happen, if not here?:

A few lines later in the same file. Junkfilters are still used.

> * Isn't "consider INSERT ... ON CONFLICT DO UPDATE" obsolete in these
> doc changes?:
>
>> -F312 MERGE statement NO consider INSERT ... ON CONFLICT DO UPDATE
>> -F313 Enhanced MERGE statement NO
>> -F314 MERGE statement with DELETE branch NO
>> +F312 MERGE statement YES consider INSERT ... ON CONFLICT DO UPDATE
>> +F313 Enhanced MERGE statement YES
>> +F314 MERGE statement with DELETE branch YES

I think that it is still useful to highlight the existence of a
non-standard feature which people might not be otherwise unaware.

If you wish me to remove a reference to your work, I will bow to your wish.

> * What's the deal with transition tables? Your docs say this:
>
>> + <varlistentry>
>> + <term><replaceable class="parameter">source_table_name</replaceable></term>
>> + <listitem>
>> + <para>
>> + The name (optionally schema-qualified) of the source table, view or
>> + transition table.
>> + </para>
>> + </listitem>
>> + </varlistentry>
>
> But the code says this:
>
>> + /*
>> + * XXX if we support transition tables this would need to move earlier
>> + * before ExecSetupTransitionCaptureState()
>> + */
>> + switch (action->commandType)

Changes made by MERGE can theoretically be visible in a transition
table. When I tried to implement that there were problems caused by
the way INSERT ON CONFLICT has been implemented, so it will take a
fair amount of study and lifting to make that work. For now, they are
not supported and generate an error message. That behaviour was not
documented, but I've done that now.

SQL Std says "Transition tables are not generally updatable (and
therefore not simply updatable) and their columns are not updatable."

So MERGE cannot be a target of a transition table, but it can be the
source of one. So the docs you cite are correct, as is the comment.

> * Can UPDATEs themselves accept an UPDATE-style WHERE clause, in
> addition to the WHEN quals, and the main ON join quals?
>
> I don't see any examples of this in your regression tests.

No, they can't. Oracle allows it but it isn't SQL Standard.

> * You added a new MERGE command tag. Shouldn't there be changes to
> libpq's fe-exec.c, to go along with that?

Nice catch. One liner patch and docs updated in repo for next patch.

> * I noticed this:
>
>> + else if (node->operation == CMD_MERGE)
>> + {
>> + /*
>> + * XXX Add more detailed instrumentation for MERGE changes
>> + * when running EXPLAIN ANALYZE?
>> + */
>> + }
>
> I think that doing better here is necessary.

Please define better. It may be possible.

> * I'm not a fan of stored rules, but I still think that this needs to
> be discussed:
>
>> - product_queries = fireRules(parsetree,
>> + /*
>> + * First rule of MERGE club is we don't talk about rules
>> + */
>> + if (event == CMD_MERGE)
>> + product_queries = NIL;
>> + else
>> + product_queries = fireRules(parsetree,
>> result_relation,
>> event,
>> locks,

It has been discussed, in my recent proposal, mentioned in the doc
page for clarity.

It has also been discussed previously in discussions around MERGE.
It's not clear how they would work, but we already have an example of
a statement type that doesn't support rules.

> * This seems totally unnecessary at first blush:
>
>> + /*
>> + * Lock tuple for update.
>> + *
>> + * XXX Is this really needed? I put this in
>> + * just to get hold of the existing tuple.
>> + * But if we do need, then we probably
>> + * should be looking at the return value of
>> + * heap_lock_tuple() and take appropriate
>> + * action.
>> + */
>> + tuple.t_self = *tupleid;
>> + test = heap_lock_tuple(relation, &tuple, estate->es_output_cid,
>> + lockmode, LockWaitBlock, false, &buffer,
>> + &hufd);
>
> * I don't know what the deal is with EPQ and MERGE in general, because
> just after the above heap_lock_tuple(), you do this:
>
>> + /*
>> + * Test condition, if any
>> + *
>> + * In the absence of a condition we perform the action
>> + * unconditionally (no need to check separately since
>> + * ExecQual() will return true if there are no
>> + * conditions to evaluate).
>> + */
>> + if (!ExecQual(action->whenqual, econtext))
>> + {
>> + if (BufferIsValid(buffer))
>> + ReleaseBuffer(buffer);
>> + continue;
>> + }
>
> Maybe *this* is why you call heap_lock_tuple(), actually, but that's
> not clear, and in any case I don't see any point in it if you don't
> check heap_lock_tuple()'s return value, and do some other extra thing
> on the basis of that return value. No existing heap_lock_tuple()
> ignores its return value, and ignoring the return value simply can't
> make sense.

Agreed, I don't think it is necessary.

> Don't WHEN quals need to participate in EPQ reevaluation, in order to
> preserve the behavior that we see with UPDATE and DELETE? Why, or why
> not?

WHEN qual evaluation occurs to establish which action to take. The
Standard is clear that this happens prior to the action.

> I suppose that the way you evaluate WHEN quals separately makes a
> certain amount of sense, but I think that you need to get things
> straight with WHEN quals and READ COMMITTED conflict handling at a
> high level (the semantics), which may or may not mean that WHEN quals
> participate in EPQ evaluation. If you're going to introduce a special
> case, I think you need to note it under the EvalPlanQual section of
> the executor README.

I wasn't going to introduce a special case.

> This much I'm sure of: you should reevaluate WHEN quals if the UPDATE
> chain is walked in READ COMMITTED mode, one way or another. It might
> end up happening in your new heap_lock_tuple() retry loop, or you
> might do something differently with EPQ, but something should happen
> (I haven't got an opinion on the implementation details just yet,
> though).

An interesting point, thank you for raising it.

WHEN quals, if they exist, may have dependencies on either the source
or target. If there is a dependency on the target there might be an
issue.

If the WHEN has a condition AND the WHEN qual fails a re-check after
we do EPQ, then I think we should just throw an error. If we
re-evaluate everything, I'm sure we'll get into some weird cases that
make MATCHED/NOT MATCHED change and that is a certain error case for
MERGE.

We might do better than that after some thought, but that seems like a
rabbit hole we should avoid in the first release. As we agreed
earlier, we can later extend MERGE to produce less errors in certain
concurrency cases.

> * Your isolation test should be commented. I'd expect you to talk
> about what is different about MERGE as far as concurrency goes, if
> anything. I note that you don't use additional WHEN quals in your
> isolation test at all (just simple WHEN NOT MATCHED + WHEN MATCHED),
> which was the first thing I looked for there. Look at
> insert-conflict-do-update-3.spec for an example of roughly the kind of
> commentary I had hoped to see in your regression test.

I will be happy to add one that exercises some new code resulting from
the above.

>> I'm expecting to commit this and then come back for the Partitioning &
>> RLS later, but will wait a few days for comments and other reviews.
>
> I don't think that it's okay to defer RLS or partitioning support till
> after an initial commit. While it's probably true that MERGE can just
> follow ON CONFLICT's example when it comes to column-level privileges,
> this won't be true with RLS.

I think it is OK to do things in major pieces, as has been done with
many other commands. We have more time in this release to do that,
though we want to find and fix any issues in basic functionality like
concurrency ahead of trying to add fancy stuff and hitting problems
with it.

I've already made two changes your review has raised, thanks. Will
re-post soon. Thanks.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-01-24 04:16:23 Re: documentation is now XML
Previous Message Tom Lane 2018-01-24 04:08:38 Re: [PATCH] fix for C4141 warning on MSVC