Re: MERGE SQL statement for PG12

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jaime Casanova <jaime(dot)casanova(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: MERGE SQL statement for PG12
Date: 2019-01-15 19:05:25
Message-ID: CA+TgmoZj8fyJGAFxs=8Or9LeNyKe_xtoSN_zTeCSgoLrUye=9Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 14, 2019 at 1:37 AM Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:
> Can you please help me understand what's fundamentally wrong with the approach and more importantly, can you please explain what would the the architecturally sound way to do this? The same also applies to the executor side where the current approach is deemed wrong, but very little is said on what's the correct way.

I think the basic problem is that optimization should happen in the
optimizer, not the parser. Deciding that MERGE looks like a RIGHT
JOIN is not a parsing task and therefore shouldn't happen in the
parser. The guys who were (are?) proposing support for ALIGN and
NORMALIZE for temporal joins got the same complaint. Here's Tom
talking about that:

http://postgr.es/m/32265.1510681378@sss.pgh.pa.us

I realize that you may have worked around some of the specific issues
that Tom mentions in that email, but I believe the general point
remains valid. What parse analysis should do is construct a
representation of the query the user entered, not some other query
derived from it. And this code clearly does the latter. You can see
by the fact that it uses QSRC_PARSER, for example, a constant that is
currently unused (and for good reason). And you can see that it
results in hacks like the cross-check to make sure that
resultRelRTE->relid == mergeRelRTE->relid. You need that because you
are manipulating and transforming the query too early in the pipeline.
If you do the transformation in the optimizer, you won't need stuff
like this. There are other hints in this function that you're really
doing planning tasks:

+ * The MERGE query's join can be tuned in some cases, see below for these
+ * special case tweaks.

Tuning the query is otherwise known as optimization. Do it in the optimizer.

+ * Simplify the MERGE query as much as possible
+ *
+ * These seem like things that could go into Optimizer, but they are
+ * semantic simplifications rather than optimizations, per se.

Actually, they are optimizations. The fact that an outer join may be
more expensive than an inner join is a costing consideration, not
something that the parser needs to know about.

+ /*
+ * This query should just provide the source relation columns. Later, in
+ * preprocess_targetlist(), we shall also add "ctid" attribute of the
+ * target relation to ensure that the target tuple can be fetched
+ * correctly.
+ */

This is not a problem unique to MERGE. It comes up for UPDATE and
DELETE as well. The handling is not all one place, but it's all in
the optimizer. The place which actually adds the CTID column is in
preprocess_targetlist(), but note that it's much smarter than the code
in the MERGE patch, because it can do different things depending on
which type of rowmark is requested. The code as it exists in the
MERGE patch can't possibly work for anything that doesn't have a CTID
table, which means it won't work for FDWs and, in the future, any new
heap types added via pluggable storage unless they happen to support
CTID. Correctly written, MERGE will leverage what the optimizer
already knows about rowmarks rather than reinventing them in the parse
analysis phase.

You should redesign this whole representation so that you just pass
the whole query through to the optimizer without any structural
change. Just as we do for other statements, you need to do the basic
transformation stuff like looking up relation OIDs: that has to do
with what the query MEANS and the external SQL objects on which it
DEPENDS; those things have to be figured out before planning so that
things like the plan-cache work correctly. But also just as we do for
other statements, you should avoid having any code in this function
that has to do with the way in which the MERGE should ultimately be
executed, and you should not have any code here that builds a new
query or does heavy restructuring of the parse tree.

Since it looks like we will initially have only one strategy for
executing MERGE, it might be OK to move the transformation that you're
currently doing in transformMergeStmt() to some early phase of the
optimizer. Here's me making approximately the same point about ALIGN
and NORMALIZE:

https://www.postgresql.org/message-id/CA+TgmoZ=UkRzpisqK5Qox_ekLG+SQP=xBeFiDkXTgLF_=1FH+Q@mail.gmail.com

However, I'm not sure that's actually the right way to go. Something,
possibly this transformation, is causing us to end up with two copies
of the target relation. This comment is a pretty good clue that this
is nothing but an ugly kludge:

+ * While executing MERGE, the target relation is processed twice; once
+ * as a target relation and once to run a join between the target and the
+ * source. We generate two different RTEs for these two purposes, one with
+ * rte->inh set to false and other with rte->inh set to true.
+ *
+ * Since the plan re-evaluated by EvalPlanQual uses the join RTE, we must
+ * install the updated tuple in the scan corresponding to that RTE. The
+ * following member tracks the index of the second RTE for EvalPlanQual
+ * purposes. ri_mergeTargetRTI is non-zero only when MERGE is in-progress.
+ * We use ri_mergeTargetRTI to run EvalPlanQual for MERGE and
+ * ri_RangeTableIndex elsewhere.

First of all, overloading the rte->inh flag to have anything to do
with this seems dead wrong. It's not a good idea to reuse random
Boolean variables for unrelated purposes. Moreover, I can't see how
this could ever be made to work with partitioned tables if that flag
has already been taken over for something else. Second, the fact that
you need logic to take the EPQ tuple from one scan and stick it into
the other scan strongly suggests to me that there shouldn't be two
scans or two RTEs in the first place.

I want to point out that it is not as if nobody has reviewed this
patch previously. Here is Andres making basically the same point
about parse analysis that I'm making here -- FWIW, I didn't find his
reply until after I'd written the above:

https://www.postgresql.org/message-id/20180403021800.b5nsgiclzanobiup%40alap3.anarazel.de

Here he is explaining these points some more:

https://www.postgresql.org/message-id/20180405200003.gar3j26gsk32gqpe%40alap3.anarazel.de

And here's Peter Geoghegan not only explaining the same problem but
having a go at fixing it:

https://www.postgresql.org/message-id/CAH2-Wz%3DZwNQvp11XjeHo-dBLHr9GDRi1vao8w1j7LQ8mOsUkzw%40mail.gmail.com

Actually, I see that Peter Geoghegan not just the emails above but a
blizzard of other emails explaining the structural problems with the
patch, which I now see include not only the parse analysis concerns
but also the business of multiple RTEs which I mentioned above.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-01-15 19:14:53 Re: MERGE SQL statement for PG12
Previous Message Kenneth Marshall 2019-01-15 18:44:19 Re: Protect syscache from bloating with negative cache entries