Re: [HACKERS] MERGE SQL Statement for PG11

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-11 03:57:08
Message-ID: CAH2-Wz=ZwNQvp11XjeHo-dBLHr9GDRi1vao8w1j7LQ8mOsUkzw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 9, 2018 at 6:55 AM, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:
>> * Is this actually needed at all?:
>>
>> > + /* In MERGE when and condition, no system column is allowed */
>> > + if (pstate->p_expr_kind == EXPR_KIND_MERGE_WHEN_AND &&
>> > + attnum < InvalidAttrNumber &&
>> > + !(attnum == TableOidAttributeNumber || attnum ==
>> > ObjectIdAttributeNumber))
>> > + ereport(ERROR,
>> > + (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
>> > + errmsg("system column \"%s\" reference in WHEN AND
>> > condition is invalid",
>> > + colname),
>> > + parser_errposition(pstate, location)));
>>
>> We're talking about the scantuple here. It's not like excluded.*.

I often care about things like system columns not because of the
user-visible functionality, but because it reassures me that the
design is robust.

>> I think that this is a blocker, unfortunately.
>
> You mean OVERRIDING or ruleutils?

I meant OVERRIDING, but ruleutils seems like something we need, too.

>> >> * I still think we probably need to add something to ruleutils.c, so
>> >> that MERGE Query structs can be deparsed -- see get_query_def().
>> >
>> > Ok. I will look at it. Not done in this version though.
>>
>> I also wonder what it would take to support referencing CTEs. Other
>> implementations do have this. Why can't we?
>
>
> Hmm. I will look at them. But TBH I would like to postpone them to v12 if
> they turn out to be tricky. We have already covered a very large ground in
> the last few weeks, but we're approaching the feature freeze date.

I quickly implemented CTE support myself (not wCTE support, since
MERGE doesn't use RETURNING), and it wasn't tricky. It seems to work
when I mechanically duplicate the approach taken with other types of
DML statement in the parser. I have written a few tests, and so far it
holds up.

I also undertook something quite a bit harder: Changing the
representation of the range table from parse analysis on. As I
mentioned in passing at one point, I'm concerned about the fact that
there are two RTEs for the target relation in all cases. You
introduced a separate Query.resultRelation-style RTI index,
Query.mergeTarget_relation, and we see stuff like this through every
stage of query processing, from parse analysis right through to
execution. One problem with the existing approach is that it leaves
many cases where EXPLAIN MERGE shows the target relation alias "t" as
"t_1" for some planner nodes, though not for others. More importantly,
I suspect that having two distinct RTEs has introduced special cases
that are not really needed, and will be more bug-prone (in fact, there
are bugs already, which I'll get to at the end). I think that it's
fair to say that what happens in the patch with EvalPlanQual()'s RTI
argument is ugly, especially because we also have a separate
resultRelInfo that we *don't* use. We should aspire to have a MERGE
implementation that isn't terribly different to UPDATE or DELETE,
especially within the executor.

I wrote a very rough patch that arranged for the MERGE rtable to have
only a single relation RTE. My approach was to teach
transformFromClauseItem() and friends to recognize that they shouldn't
create a new RTE for the MERGE target RangeVar. I actually added some
hard coding to getRTEForSpecialRelationTypes() for this, which is ugly
as sin, but the general approach is likely salvageable (we could
invent a special type of RangeVar to do this, perhaps). The point here
is that everything just works if there isn't a separate RTE for the
join's leftarg and the setTargetTable() target, with exactly one
exception: partitioning becomes *thoroughly* broken. Every single
partitioning test fails with "ERROR: no relation entry for relid 1",
and occasionally some other "can't happen" error. This looks like it
would be hard to fix -- at least, I'd find it hard to fix.

This is an extract from header comments for inheritance_planner()
(master branch):

* We have to handle this case differently from cases where a source relation
* is an inheritance set. Source inheritance is expanded at the bottom of the
* plan tree (see allpaths.c), but target inheritance has to be expanded at
* the top. The reason is that for UPDATE, each target relation needs a
* different targetlist matching its own column set. Fortunately,
* the UPDATE/DELETE target can never be the nullable side of an outer join,
* so it's OK to generate the plan this way.

Of course, that isn't true of MERGE: The MERGE target *can* be the
nullable side of an outer join. That's probably a big complication for
using one target RTE. Your approach to implementing partitioning [1]
seems to benefit from having two different RTEs, in a way -- you
sidestep the restriction. As you put it, "Since MERGE need both the
facilities [both INSERT and UPDATE facilities], I'd to pretty much
merge both the machineries". As I understand it, you "merge" these
machineries by having find_mergetarget_for_rel() work backwards. You
are mapping from one relation tree to another relation tree in an
ad-hoc fashion -- both trees for the same underlying RTE. (You
compensate for this later, in the executor, with the special
EvalPlanQual() RTI stuff I mentioned already.)

I have some broad concerns here. I would especially like to hear from
hackers that know more about partitioning/inheritance than I do on
these concerns. They are:

* Is it okay to take this approach with partitioning?

I worry about things like ExecAuxRowMark handling. We avoid calling
EvalPlanQualSetPlan() within ExecModifyTable() for CMD_MERGE, so I'm
pretty sure that that's already broken as-is. Separately, can't see
why it's okay that CMD_MERGE doesn't have mt_transition_capture
initialization occur per-child, so that's probably another bug of the
same general variety. To be clear, this is the specific part of the
patch that avoids going through child plans as described:

> @@ -1927,7 +2590,14 @@ ExecModifyTable(PlanState *pstate)
> {
> /* advance to next subplan if any */
> node->mt_whichplan++;
> - if (node->mt_whichplan < node->mt_nplans)
> +
> + /*
> + * If we are executing MERGE, we only need to execute the first
> + * subplan since it's guranteed to return all the required tuples.
> + * In fact, running remaining subplans would be a problem since we
> + * will end up fetching the same tuples N times.
> + */
> + if (node->mt_whichplan < node->mt_nplans && (operation != CMD_MERGE))
> {
> resultRelInfo++;
> subplanstate = node->mt_plans[node->mt_whichplan];

* Is there a way to make what I describe (having a single target RTE)
work with partitioning, without any big new special cases, especially
in the executor?

* Any thoughts on this multiple-RTEs-for-target-rel business more generally?

[1] https://postgr.es/m/CABOikdPjjG+JcdNeegrL7=BtPdJ6yEv--V4hU8KzJTTwX1SNmw@mail.gmail.com
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2018-03-11 05:22:02 Re: [HACKERS] MERGE SQL Statement for PG11
Previous Message Peter Eisentraut 2018-03-10 23:36:48 Re: disable SSL compression?