Re: support for MERGE

From: Andres Freund <andres(at)anarazel(dot)de>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Daniel Westermann <dwe(at)dbi-services(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
Subject: Re: support for MERGE
Date: 2022-01-28 22:19:33
Message-ID: 20220128221933.qgu4n7tphdiryjt3@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-01-28 17:27:37 -0300, Alvaro Herrera wrote:
> MERGE, v10. I am much more comfortable with this version; I have
> removed a bunch of temporary hacks and cleaned up the interactions with
> table AM and executor, which is something that had been bothering me for
> a while. The complete set of changes can be seen in github,
> https://github.com/alvherre/postgres/commits/merge-15

Any chance you could split this into something more reviewable? The overall
diff stats are: 102 files changed, 8589 insertions(+), 234 deletions(-) thats
pretty hard to really review. Incremental commits don't realy help with that.

> I am not aware of anything of significance in terms of remaining work
> for this project.

One thing from skimming: There's not enough documentation about the approach
imo - it's a complicated enough feature to deserve more than the one paragraph
in src/backend/executor/README.

I'm still [1],[2] uncomfortable with execMerge.c kinda-but-not-really being an
executor node.

> The one thing I'm a bit bothered about is the fact that we expose a lot of
> executor functions previously static. I am now wondering if it would be
> better to move the MERGE executor support functions into nodeModifyTable.c,
> which I think would mean we would not have to expose those function
> prototypes.

I'm worried about the complexity of nodeModifyTuple.c as is. Moving more code
in there does not seem like the right call to me - we should do the opposite
if anything.

A few inline comments below. No real review, just stuff noticed while skimming
to see where this is at.

Greetings,

Andres

[1] https://www.postgresql.org/message-id/20180405200003.gar3j26gsk32gqpe@alap3.anarazel.de
[2] https://www.postgresql.org/message-id/20180403021800.b5nsgiclzanobiup@alap3.anarazel.de

> diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
> index 1a9c1ac290..280ac40e63 100644
> --- a/src/backend/commands/trigger.c
> +++ b/src/backend/commands/trigger.c

This stuff seems like one candidate for splitting out.

> + /*
> + * We maintain separate transition tables for UPDATE/INSERT/DELETE since
> + * MERGE can run all three actions in a single statement. Note that UPDATE
> + * needs both old and new transition tables whereas INSERT needs only new,
> + * and DELETE needs only old.
> + */
> +
> + /* "old" transition table for UPDATE, if any */
> + Tuplestorestate *old_upd_tuplestore;
> + /* "new" transition table for UPDATE, if any */
> + Tuplestorestate *new_upd_tuplestore;
> + /* "old" transition table for DELETE, if any */
> + Tuplestorestate *old_del_tuplestore;
> + /* "new" transition table INSERT, if any */
> + Tuplestorestate *new_ins_tuplestore;
> +
> TupleTableSlot *storeslot; /* for converting to tuplestore's format */
> };

Do we need to do something slightly clever about the memory limits for the
tuplestore? Each of them having a separate work_mem makes nodeModifyTable.c
one memory hungry node (the worst of all maybe?).

> +
> +/*
> + * Perform MERGE.
> + */
> +TupleTableSlot *
> +ExecMerge(ModifyTableState *mtstate, ResultRelInfo *resultRelInfo,
> + EState *estate, TupleTableSlot *slot)
> +{
> + ExprContext *econtext = mtstate->ps.ps_ExprContext;
> + ItemPointer tupleid;
> + ItemPointerData tuple_ctid;
> + bool matched = false;
> + Datum datum;
> + bool isNull;
> +
> + /*
> + * Reset per-tuple memory context to free any expression evaluation
> + * storage allocated in the previous cycle.
> + */
> + ResetExprContext(econtext);

Hasn't this already happened in ExecModifyTable()?

> + /*
> + * We run a JOIN between the target relation and the source relation to
> + * find a set of candidate source rows that have a matching row in the
> + * target table and a set of candidate source rows that do not have a
> + * matching row in the target table. If the join returns a tuple with the
> + * target relation's row-ID set, that implies that the join found a
> + * matching row for the given source tuple. This case triggers the WHEN
> + * MATCHED clause of the MERGE. Whereas a NULL in the target relation's
> + * row-ID column indicates a NOT MATCHED case.
> + */
> + datum = ExecGetJunkAttribute(slot,
> + resultRelInfo->ri_RowIdAttNo,
> + &isNull);

Could this be put somewhere common with the equivalent call in
ExecModifyTable()?

> + * A concurrent update can:
> + *
> + * 1. modify the target tuple so that it no longer satisfies the
> + * additional quals attached to the current WHEN MATCHED action
> + *
> + * In this case, we are still dealing with a WHEN MATCHED case.
> + * In this case, we recheck the list of WHEN MATCHED actions from
> + * the start and choose the first one that satisfies the new target
> + * tuple.

Duplicated "in this case"

> + case TM_Invisible:
> +
> + /*
> + * This state should never be reached since the underlying
> + * JOIN runs with a MVCC snapshot and EvalPlanQual runs
> + * with a dirty snapshot. So such a row should have never
> + * been returned for MERGE.
> + */
> + elog(ERROR, "unexpected invisible tuple");
> + break;

Seems like it was half duplicated at some point?

> + case TM_Updated:
> + case TM_Deleted:
> +
> + /*
> + * The target tuple was concurrently updated/deleted by
> + * some other transaction.
> + *
> + * If the current tuple is the last tuple in the update
> + * chain, then we know that the tuple was concurrently
> + * deleted. Just return and let the caller try NOT MATCHED
> + * actions.

Is it really correct to behave that way when using repeatable read /
serializable?

> + /*
> + * Project the tuple. In case of a partitioned table, the
> + * projection was already built to use the root's descriptor,
> + * so we don't need to map the tuple here.
> + */
> + actionInfo.actionState = action;
> + insert_slot = ExecProject(action->mas_proj);
> +
> + (void) ExecInsert(mtstate, rootRelInfo,
> + insert_slot, slot,
> + estate, &actionInfo,
> + mtstate->canSetTag);
> + InstrCountFiltered1(&mtstate->ps, 1);

What happens if somebody concurrently inserts a conflicting row?

> --- a/src/include/executor/instrument.h
> +++ b/src/include/executor/instrument.h
> @@ -82,8 +82,11 @@ typedef struct Instrumentation
> double ntuples; /* total tuples produced */
> double ntuples2; /* secondary node-specific tuple counter */
> double nloops; /* # of run cycles for this node */
> - double nfiltered1; /* # of tuples removed by scanqual or joinqual */
> - double nfiltered2; /* # of tuples removed by "other" quals */
> + double nfiltered1; /* # tuples removed by scanqual or joinqual OR
> + * # tuples inserted by MERGE */
> + double nfiltered2; /* # tuples removed by "other" quals OR #
> + * tuples updated by MERGE */
> + double nfiltered3; /* # tuples deleted by MERGE */

It was a bad idea to have nfiltered1/2. Arriving at nfiltered3, it seems we
really should rename them...

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhihong Yu 2022-01-28 22:32:37 Re: support for MERGE
Previous Message Tom Lane 2022-01-28 22:16:32 Re: Support tab completion for upper character inputs in psql