Re: [HACKERS] MERGE SQL Statement for PG11

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-11 05:22:02
Message-ID: CABOikdPu4NCQ3rZo95bONOzZFQ5yYavwy0Ziyc9td-ngsgiwKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 11, 2018 at 9:27 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:

>
> >>
> >> We're talking about the scantuple here. It's not like excluded.*.
>
> I often care about things like system columns not because of the
> user-visible functionality, but because it reassures me that the
> design is robust.
>
>
Ok. I will look at it. I think it shouldn't be too difficult and the
original restriction was mostly a fallout of expecting CHECK constraint
style expressions there.

> >> I think that this is a blocker, unfortunately.
> >
> > You mean OVERRIDING or ruleutils?
>
> I meant OVERRIDING, but ruleutils seems like something we need, too.
>

Ok. OVERRIDING is done. I think we can support ruleutils easily too. I
don't know how to test that though.

>
> >> >> * I still think we probably need to add something to ruleutils.c, so
> >> >> that MERGE Query structs can be deparsed -- see get_query_def().
> >> >
> >> > Ok. I will look at it. Not done in this version though.
> >>
> >> I also wonder what it would take to support referencing CTEs. Other
> >> implementations do have this. Why can't we?
> >
> >
> > Hmm. I will look at them. But TBH I would like to postpone them to v12 if
> > they turn out to be tricky. We have already covered a very large ground
> in
> > the last few weeks, but we're approaching the feature freeze date.
>
> I quickly implemented CTE support myself (not wCTE support, since
> MERGE doesn't use RETURNING), and it wasn't tricky. It seems to work
> when I mechanically duplicate the approach taken with other types of
> DML statement in the parser. I have written a few tests, and so far it
> holds up.
>

Ok, thanks. I started doing something similar, but great if you have
already implemented. I will focus on other things for now.

>
> I also undertook something quite a bit harder: Changing the
> representation of the range table from parse analysis on. As I
> mentioned in passing at one point, I'm concerned about the fact that
> there are two RTEs for the target relation in all cases. You
> introduced a separate Query.resultRelation-style RTI index,
> Query.mergeTarget_relation, and we see stuff like this through every
> stage of query processing, from parse analysis right through to
> execution. One problem with the existing approach is that it leaves
> many cases where EXPLAIN MERGE shows the target relation alias "t" as
> "t_1" for some planner nodes, though not for others. More importantly,
> I suspect that having two distinct RTEs has introduced special cases
> that are not really needed, and will be more bug-prone (in fact, there
> are bugs already, which I'll get to at the end). I think that it's
> fair to say that what happens in the patch with EvalPlanQual()'s RTI
> argument is ugly, especially because we also have a separate
> resultRelInfo that we *don't* use. We should aspire to have a MERGE
> implementation that isn't terribly different to UPDATE or DELETE,
> especially within the executor.
>

I thought for a while about this and even tried multiple approaches before
settling for what we have today. The biggest challenge is that
inheritance/partition tables take completely different paths for INSERTs
and UPDATE/DELETE. The RIGHT OUTER JOIN makes it kinda difficult because
the regular UPDATE/DELETE code path ends up throwing duplicates when the
source table is joined with individual partitions. IIRC that's the sole
reason why I'd to settle on pushing the JOIN underneath, give it SELECT
like treatment and then handle UPDATE/DELETE in the executor.

>
> I wrote a very rough patch that arranged for the MERGE rtable to have
> only a single relation RTE. My approach was to teach
> transformFromClauseItem() and friends to recognize that they shouldn't
> create a new RTE for the MERGE target RangeVar. I actually added some
> hard coding to getRTEForSpecialRelationTypes() for this, which is ugly
> as sin, but the general approach is likely salvageable (we could
> invent a special type of RangeVar to do this, perhaps). The point here
> is that everything just works if there isn't a separate RTE for the
> join's leftarg and the setTargetTable() target, with exactly one
> exception: partitioning becomes *thoroughly* broken. Every single
> partitioning test fails with "ERROR: no relation entry for relid 1",
> and occasionally some other "can't happen" error. This looks like it
> would be hard to fix -- at least, I'd find it hard to fix.
>
>
Ok. If you've something which is workable, then great. But AFAICS this is
what the original patch was doing until we came to support partitioning.
Even with partitioning I could get everything to work, without duplicating
the RTE, except the duplicate rows issue. I don't know how to solve that
without doing what I've done or completely rewriting UPDATE/DELETE handling
for inheritance/partition table. If you or others have better ideas, they
are most welcome.

> This is an extract from header comments for inheritance_planner()
> (master branch):
>
> * We have to handle this case differently from cases where a source
> relation
> * is an inheritance set. Source inheritance is expanded at the bottom of
> the
> * plan tree (see allpaths.c), but target inheritance has to be expanded at
> * the top. The reason is that for UPDATE, each target relation needs a
> * different targetlist matching its own column set. Fortunately,
> * the UPDATE/DELETE target can never be the nullable side of an outer
> join,
> * so it's OK to generate the plan this way.
>
> Of course, that isn't true of MERGE: The MERGE target *can* be the
> nullable side of an outer join. That's probably a big complication for
> using one target RTE. Your approach to implementing partitioning [1]
> seems to benefit from having two different RTEs, in a way -- you
> sidestep the restriction.

Right. The entire purpose of having two different RTEs is to work around
this problem. I explained this approach here [1]. I didn't receive any
objections then, but that's mostly because nobody read it carefully. As I
said, if we have an alternate feasible and better mechanism, let's go for
it as long as efforts are justifiable.

Thanks,
Pavan

[1]
https://www.postgresql.org/message-id/CABOikdM%2Bc1vB_%2B3tYEjO%3DJ6U2uNHzKU_b%3DU72tadD5-9xQcbHA%40mail.gmail.com

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2018-03-11 05:27:52 Re: Faster inserts with mostly-monotonically increasing values
Previous Message Peter Geoghegan 2018-03-11 03:57:08 Re: [HACKERS] MERGE SQL Statement for PG11