Re: support for MERGE

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>
Cc: Zhihong Yu <zyu(at)yugabyte(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Daniel Westermann <dwe(at)dbi-services(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: support for MERGE
Date: 2022-03-16 19:18:48
Message-ID: 202203161918.qz6phlortw2w@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I spent some time experimenting if I could make ExecUpdateEpilogue /
ExecDeleteEpilogue return the RETURNING tuple, but this turns out not to
work very well. Maybe there's a way to not make it look completely
silly, but at least what I tried ended up strictly worse.

I realized we weren't testing "do nothing" triggers with MERGE, and lo
and behold, the previous coding misbehaved. I added test cases for it
and fixed the code. I made a few other smallish changes, too cosmetic
to mention.

Thanks to Justin, Japin, Amit for their reviews. Answers inline below.

On 2022-Mar-12, Justin Pryzby wrote:

> The .h files still order these fields differently than the other .h files, and
> then the node funcs (at least MergeAction) also have a different order than the
> .h files.

Fixed. I also moved functions around to make everything consistent. I
decided to put MergeWhereClause (raw parser output) followed by
MergeAction (output of parse analysis) and also put the fields of both
in sync and in consistent order everywhere.

On 2022-Mar-14, Amit Langote wrote:

> On Sun, Mar 13, 2022 at 12:36 AM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:

> I looked at 0001 and found a few things that could perhaps be improved.
> + /* Slot containing tuple obtained from subplan, for RETURNING */
> + TupleTableSlot *planSlot;
> I think planSlot is also used by an FDW to look up any FDW-specific
> junk attributes that the FDW's scan method would have injected into
> the slot, so maybe no need to specifically mention only RETURNING here
> as what cares about it.

I agree; I had already rewritten this comment on the plane.

> + /*
> + * The tuple produced by EvalPlanQual to retry from, if a cross-partition
> + * UPDATE requires it
> + */
> + TupleTableSlot *retrySlot;
> Maybe a bit long name, though how about calling this updateRetrySlot
> to make it apparent that what is to be retried with the slot is the
> whole UPDATE operation, not some sub-operation (like say the
> cross-partition portion)?

The reason I didn't want to have "Slot" in there is that it's a bit of
Hungarian notation; we already know that it's a slot by its type. But
apparently we do this almost everywhere, so I just took your suggestions
on naming these two fields.

> +/*
> + * Context struct containing output data specific to UPDATE operations.
> + */
> +typedef struct UpdateContext
> +{
> + bool updateIndexes; /* index update required? */
> + bool crossPartUpdate; /* was it a cross-partition update? */
> +} UpdateContext;
> I wonder if it isn't more readable to just have these two be the
> output arguments of ExecUpdateAct()?

I decided not to do that; I had to add "bool updated" back into that
struct once I noticed the problem with do-nothing triggers, as mentioned
above. Then we're talking about three output argument and that gets too
ugly IMV, but I didn't really try to write the code.

> +redo_act:
> + /* XXX reinit ->crossPartUpdate here? */
> Why not just make ExecUpdateAct() always set the flag, not only when a
> cross-partition update does occur?

Yeah, I ended up doing that. I didn't want to do that because it's
wasted work in the vast majority of cases; but if we want to avoid it,
we need to add a reset to false in the "goto redo_act" place, which
could cause future bugs of omission if we later add similar gotos.

> +/*
> + * Context struct for a ModifyTable operation, containing basic execution state
> + * and some output data.
> + */
> Does it make sense to expand "some output data", maybe as:
> ...and some output variables populated by the ExecUpdateAct() and
> ExecDeleteAct() to report the result of their actions to ExecUpdate()
> and ExecDelete() respectively.

Looks good, applied.

> + /* output */
> + TM_FailureData tmfd; /* stock failure data */
> Maybe we should be more specific here, say:
> Information about the changes that were found to be made concurrently
> to a tuple being updated or deleted

WFM. I changed "were found to be made" to "were made".

> + LockTupleMode lockmode; /* tuple lock mode */
> Also here, say:
> Lock mode to acquire on the latest tuple before performing EvalPlanQual() on it

WFM; I ditched the parens and used "latest tuple version".

> Hmm, ExecDelete() and ExecUpdate() never see a partitioned table,
> because [...]

> So the following suffices, for ExecDeleteAct():
> Actually delete the tuple from a plain table.


> and for ExecUpdateAct(), maybe it makes sense to mention the
> possibility of a cross-partition partition.
> Actually update the tuple of a plain table. If the table happens to
> be a leaf partition and the update causes its partition constraint to
> be violated, ExecCrossPartitionUpdate() is invoked to move the updated
> tuple into the appropriate partition.

I used this:
* Actually update the tuple, when operating on a plain table. If the
* table is a partition, and the command was called referencing an ancestor
* partitioned table, this routine is in charge of migrating the resulting
* tuple to another partition.

I also changed the other part of this comment, to read:

* The caller is in charge of keeping indexes current as necessary. The
* caller is also in charge of doing EvalPlanQual if the tuple is found to
* be concurrently updated. However, in case of a cross-partition update,
* this routine does it.
* Caller is in charge of doing EvalPlanQual as necessary, and of keeping
* indexes current for the update.

This is sufficient for 0001, but needs a bit of a tweak for MERGE, since
we return early in that case. (And this is TBH *the* point I'm
uncomfortable with the MERGE patch: this bit becomes squishy. I've been
wondering if it's possible to split ExecUpdateAct in even smaller pieces
in order to avoid this, but I just don't see how.)

> + * Closing steps of tuple deletion; this invokes AFTER FOR EACH ROW triggers,
> + * including the UPDATE triggers if there was a cross-partition tuple move.
> How about:
> ...including the UPDATE triggers if the ExecDelete() is being done as
> part of a cross-partition tuple move.

WFM, but I'd rather not assume that we're being called from ExecDelete.
Better to say "if the deletion is being done ..." (It remains true with
MERGE as currently, but do we know that it will forever?)

> + /* and project to create a current version of the new tuple */
> How about:
> and project the new tuple to retry the UPDATE with


On 2022-Mar-14, Japin Li wrote:

> + ar_delete_trig_tcs = mtstate->mt_transition_capture;
> + if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture &&
> + mtstate->mt_transition_capture->tcs_update_old_table)
> + {
> + ExecARUpdateTriggers(estate, resultRelInfo, tupleid, oldtuple,
> + NULL, NULL, mtstate->mt_transition_capture);
> +
> + /*
> + * We've already captured the NEW TABLE row, so make sure any AR
> + * DELETE trigger fired below doesn't capture it again.
> + */
> + ar_delete_trig_tcs = NULL;
> + }
> Maybe we can use ar_delete_trig_tcs in if condition and
> ExecARUpdateTriggers() to make the code shorter.

... Well, the code inside the block is about UPDATE triggers, so it's a
nasty to use the variable whose name says it's for DELETE triggers.
That variable really is just a clever flag that indicates whether or not
to call the DELETE part. It's a bit of strange coding, but ...

> + /* "old" transition table for DELETE, if any */
> + Tuplestorestate *old_del_tuplestore;
> + /* "new" transition table INSERT, if any */
> + Tuplestorestate *new_ins_tuplestore;
> Should we add "for" for transition INSERT comment?

Absolutely. Fixed.

> +MERGE is a multiple-table, multiple-action command: it specifies a target
> +table and a source relation, and can contain multiple WHEN MATCHED and
> +WHEN NOT MATCHED clauses, each of which specifies one UPDATE, INSERT,
> +UPDATE, or DO NOTHING actions. The target table is modified by MERGE,
> +and the source relation supplies additional data for the actions
> I think the "source relation" is inaccurate. How about using "data
> source" like document?

I debated this with myself when I started using the term "source
relation" here. The result of a query is also a relation, so this term
is not incorrect; in fact, the glossary entry for "relation" explains

It's true that it's not the typical use of the term in our codebase.

Overall, I'm -0 for changing this. (If we decide to change it, there
are other places that would need to change as well.)

Álvaro Herrera PostgreSQL Developer —

Attachment Content-Type Size
v16-0001-Split-ExecUpdate-and-ExecDelete-in-reusable-piec.patch text/x-diff 40.8 KB
v16-0002-MERGE-SQL-Command-following-SQL-2016.patch text/x-diff 386.8 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua Brindle 2022-03-16 19:36:21 Re: Granting SET and ALTER SYSTE privileges for GUCs
Previous Message Tom Lane 2022-03-16 19:06:04 Re: Granting SET and ALTER SYSTE privileges for GUCs