Re: [HACKERS] MERGE SQL Statement for PG11

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-08 12:54:25
Message-ID: CABOikdOUoaXt1H885TC_cA1LoErEejqdVDZqG62rQkiZPPyg0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 6, 2018 at 9:55 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:

>
>
> Sure, ExecUpdate() and ExecDelete() *do* require a little special
> handling for MERGE, but ExecMerge() should itself call EvalPlanQual()
> for the join quals, rather than getting ExecUpdate()/ExecDelete() to
> do it. All that ExecUpdate()/ExecDelete() need to tell ExecMerge() is
> "you need to do your HeapTupleUpdated stuff". This not only clarifies
> the new EPQ design -- it also lets you do that special case rti EPQ
> stuff in the right place (BTW, I'm not a huge fan of that detail in
> general, but haven't thought about it enough to say more). The new EPQ
> code within ExecUpdate() and ExecDelete() is already identical, so why
> not do it that way?
>
> I especially like this organization because ExecOnConflictUpdate()
> already calls ExecUpdate() without expecting it to do EPQ handling at
> all, which is kind of similar to what I suggest -- in both cases, the
> caller "takes care of all of the details of the scan". And, by
> returning a HTSU_Result value to ExecMerge() from
> ExecUpdate()/ExecDelete(), you could also do the new cardinality
> violation stuff from within ExecMerge() in exactly one place -- no
> need to invent new error_on_SelfUpdate arguments.
>
> I would also think about organizing ExecMerge() handling so that
> CMD_INSERT handling became WHEN NOT MATCHED handling. It's
> unnecessarily confusing to have that included in the same switch
> statement as CMD_UPDATE and CMD_DELETE, since that organization fails
> to convey that once we reach WHEN NOT MATCHED, there is no going back.
> Actually, I suppose the new EPQ organization described in the last few
> paragraphs already implies that you'd do this CMD_INSERT stuff, since
> the high level goal is breaking ExecMerge() into WHEN MATCHED and WHEN
> NOT MATCHED sections (doing "commonality and variability analysis" for
> CMD_UPDATE and CMD_DELETE only serves that high level goal).
>
>
Thanks for the feedback. I've greatly refactored the code based on your
comments and I too like the resultant code. What we have now have
essentially is: two functions:

ExecMergeMatched() - deals with matched rows. In case of concurrent
update/delete, it also runs EvalPlanQual and checks if the updated row
still meets the join quals. If so, it will restart and recheck if some
other WHEN MATCHED action should be executed. If the target row is deleted
or if the join quals fail, it simply returns and let the next function
handle it

ExecMargeNotMatched() - deals with not-matched rows. Essentially it only
needs to execute INSERT if a WHEN NOT MATCHED action exists and the
additional quals pass.

Per your suggestion, ExecUpdate() and ExecDelete() only returns enough
information for ExecMergeMatched() to run EvalPlanQual or take any other
action, as needed. So changes to these functions are now minimal. Also, the
EPQ handling for UPDATE/DELETE is now common, which is nice.

I've also added a bunch of comments, but it might still not be enough. Feel
free to suggest improvements there.

> Other feedback:
>
> * Is a new ExecMaterializeSlot() call needed for the EPQ slot? Again,
> doing your own EvalPlanQual() within ExecMerge() would perhaps have
> avoided this.
>

Fixed.

>
> * Do we really need UpdateStmt raw parser node pointers in structs
> that appear in execnodes.h? What's that all about?
>

No, it was left-over from the earlier code. Removed.

>
> * Tests for transition table behavior (mixed INSERTs, UPDATEs, and
> DELETEs) within triggers.sql seems like a good idea.
>

Ok, I will add. But not done in this version.

>
> * Is this comment really accurate?:
>
> > + /*
> > + * If EvalPlanQual did not return a tuple, it means we
> > + * have seen a concurrent delete, or a concurrent update
> > + * where the row has moved to another partition.
> > + *
> > + * Set matched = false and loop back if there exists a
> NOT
> > + * MATCHED action. Otherwise, we have nothing to do for
> this
> > + * tuple and we can continue to the next tuple.
> > + */
>
> Won't we loop back when a concurrent update occurs that makes the new
> candidate tuple not satisfy the join qual anymore? What does this have
> to do with partitioning?
>

Yeah, it was wrong. Fixed as part of larger reorganisation anyways.

>
> * Why does MergeJoin get referenced here?:
>
> > + * Also, since the internal MergeJoin node can hide the source and
> target
> > + * relations, we must explicitly make the respective relation as
> visible so
> > + * that columns can be referenced unqualified from these relations.
>

I meant to say underlying JOIN for MERGE. Fixed by simply saying Join.

>
> * This patch could use a pg_indent.
>
>
Done.

> * We already heard about this code from Tomas, who found it unnecessary:
>
> > + /*
> > + * SQL Standard says that WHEN AND conditions must not
> > + * write to the database, so check we haven't written
> > + * any WAL during the test. Very sensible that is, since
> > + * we can end up evaluating some tests multiple times if
> > + * we have concurrent activity and complex WHEN clauses.
> > + *
> > + * XXX If we had some clear form of functional labelling
> > + * we could use that, if we trusted it.
> > + */
> > + if (startWAL < GetXactWALBytes())
> > + ereport(ERROR,
> > + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > + errmsg("cannot write to database within WHEN
> AND condition")));
>
> This needs to go. Apart from the fact that GetXactWALBytes() is buggy
> (it returns int64 for the unsigned type XLogRecPtr), the whole idea
> just seems unnecessary. I don't see why this is any different to using
> a volatile function in a regular UPDATE.
>

I removed this code since it was wrong. We might want to add some basic
checks for existence of volatile functions in the WHEN or SET clauses. But
I agree, it's no different than regular UPDATEs. So may be not a big deal.

>
> * I still think we probably need to add something to ruleutils.c, so
> that MERGE Query structs can be deparsed -- see get_query_def().
>

Ok. I will look at it. Not done in this version though.

Rebased on the current master too.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
merge_v19b.patch application/octet-stream 289.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2018-03-08 13:11:33 Re: ALTER TABLE ADD COLUMN fast default
Previous Message Shubham Barai 2018-03-08 12:40:09 Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)