Re: [HACKERS] MERGE SQL Statement for PG11

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-21 19:45:24
Message-ID: CAH2-WzmSMUpH35eZpaWyhs7M-AoSNOcW4CPvGwnnNP+nrbGxAQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 21, 2018 at 5:23 AM, Pavan Deolasee
<pavan(dot)deolasee(at)gmail(dot)com> wrote:
>> * The docs say "A condition cannot contain subqueries, set returning
>> functions, nor can it contain window or aggregate functions". Thought
>> it can now?
>
>
> Yes, it now supports sub-queries. What about set-returning, aggregates etc?
> I assume they are not supported in other places such as WHERE conditions and
> JOIN quals. So they will continue to remain blocked even in WHEN conditions.
> Do you think it's worth mentioning or we should not mention anything at all?

I would not mention anything at all. It's just the same as other DML statements.

>> * The docs say "INSERT actions cannot contain sub-selects". Didn't that
>> change?
>
>
> No, it did not. We only support VALUES clause with INSERT action.

But can't you have a subselect in the VALUES()? Support for subselects
seems like a totally distinct thing to the restriction that only a
(single row) VALUES() is allowed in INSERT actions.

> You mean the doc changes are unnecessary or the EXPLAIN ANALYZE output is
> unnecessary? I assume the doc changes, but let me know if that's wrong
> assumption.

I meant the doc changes.

> Again, do you mean we should raise error or just that the docs should not
> mention anything about it? I don't think raising an error because the
> candidate row did not meet any specified action is a good idea. May be some
> day we add another option to store such rows in a separate temporary table.

I agree that it's not a good idea to raise an error because the
candidate row did not meet any specified action. My point is that you
don't even need to mention that we don't do that in the docs. It's not
like there is any reason to expect that we should -- there is no
precedent.

I think that that's in the docs because this behavior was contemplated
during MERGE's development. But that behavior was ultimately rejected.
In large part because there was no precedent.

>> * It might make sense to point out in the docs that join_condition
>> should not filter the target table too much. Like SQL server docs say,
>> don't put things in the join that filter the target that actually
>> belong in the WHEN .. AND quals. In a way, this should be obvious,
>> because it's an outer join. But I don't think it is, and ISTM that the
>> sensible thing to do is to warn against it.
>>
>
> Hmm, ok. Not sure how exactly to put that in words without confusing users.
> Do you want to suggest something?

Perhaps a Warning box should say:

Only columns from "target_table_name" that attempt to match
"data_source" rows should appear in "join_condition".
"join_condition" subexpressions that only reference
"target_table_name" columns can only affect which action is taken,
often in surprising ways.

>> * We never actually get around to saying that MERGE is good with bulk
>> loading, ETL, and so on. I think that we should remark on that in
>> passing.
>
>
> Suggestion?

How about adding this sentence after "MERGE ... a task that would
otherwise require multiple procedural language statements":

MERGE can synchronize two tables by modifying one table based on
differences between it and some other table.

The point here is that we're primarily talking about two whole tables.
That deserves such prominent placement, as that suggests where users
might really find MERGE useful, but without being too prescriptive.

Also, instead of saying "There are a variety of differences and
restrictions between the two statement types [MERGE and INSERT ... ON
CONFLICT DO UPDATE] and they are not interchangeable", you could
instead be specific, and say:

MERGE is well suited to synchronizing two tables using multiple
complex conditions. Using INSERT with ON CONFLICT DO UPDATE works well
when requirements are simpler. Only ON CONFLICT provides an atomic
INSERT or UPDATE outcome in READ COMMITTED mode.

BTW, the docs should be clear on the fact that "INSERT ... ON
CONFLICT" isn't a statement. INSERT is. ON CONFLICT is a clause.

>> * I think that the mvcc.sgml changes can go. Perhaps a passing
>> reference to MERGE can be left behind, that makes it clear that it's
>> really rather like UPDATE FROM and so on. The fact that it's like
>> UPDATE FROM now seems crystal clear.
>>
>
> It seems useful to me. Should we move it to merge.sgml instead?

The mvcc.sgml changes read like a status report on the patch's
behavior with concurrency. Obviously that general tone is not
appropriate for a committed patch. Also, what it describes doesn't
seem to have much to do with MVCC rules per say. The only thing that
seems to warrant discussion in mvcc.sgml is how MERGE really *isn't* a
special case. ISTM that you only really need to mention how the
decision to use one particular WHEN action can change repeatedly -
every time you walk the UPDATE chain, you start that part from the
beginning.

The "you might get a duplicate violation" bit can definitely live in
merge.sgml, right at the point that ON CONFLICT is mentioned (the Tip
box). I don't think that you need too much on this.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-03-21 19:47:41 Re: JIT compiling with LLVM v12.2
Previous Message David Steele 2018-03-21 19:31:43 Re: Re: ALTER TABLE ADD COLUMN fast default