Re: MERGE ... WHEN NOT MATCHED BY SOURCE

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Vik Fearing <vik(at)postgresfriends(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: MERGE ... WHEN NOT MATCHED BY SOURCE
Date: 2024-03-04 09:44:55
Message-ID: CAEZATCV-7j0wq6T2AGsyRbaVY5muZ8KJ07L-=8Pvt=a3w1V5vA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 29 Jan 2024 at 10:07, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
>
> One thing that's bothering me though is what happens if a row being
> merged is concurrently updated. Specifically, if a concurrent update
> causes a formerly matching row to no longer match the join condition,
> and there are both NOT MATCHED BY SOURCE and NOT MATCHED BY TARGET
> actions, so that it's doing in full join between the source and target
> relations. In this case, when the EPQ mechanism rescans the subplan
> node, there will be 2 possible output tuples (one with source null,
> and one with target null), and EvalPlanQual() will just return the
> first one, which is a more-or-less arbitrary choice, depending on the
> type of join (hash/merge), and (for a mergejoin) the values of the
> inner and outer join keys. Thus, it may execute a NOT MATCHED BY
> SOURCE action, or a NOT MATCHED BY TARGET action, and it's difficult
> to predict which.
>

I set out to rebase this on top of 5f2e179bd3 (support for MERGE into
views), and ended up hacking on it quite a bit. Aside from some
cosmetic stuff, I made 3 bigger changes:

1). It turned out that simply rebasing this didn't work for NOT
MATCHED BY SOURCE actions on an auto-updatable view. This was due to
the fact that transformMergeStmt() puts the quals from a MERGE's join
condition temporarily into query->jointree->quals, as if they were
normal WHERE quals. That's a problem, because when the rewriter
expands a target auto-updatable view with its own WHERE quals, they
end up getting added to the same overall set of WHERE quals, which
transform_MERGE_to_join() then attaches to the JoinExpr that it
constructs. That's not a problem for the INNER/RIGHT joins used
without this patch, but for the LEFT/FULL joins produced when there
are NOT MATCHED BY SOURCE actions, it produces incorrect results,
because the view's WHERE quals on the target relation need to be
underneath the JoinExpr, not on it, to work correctly when the source
row is null.

To fix that, I added a new Query->mergeJoinCondition field to keep the
MERGE join quals separate from the query's WHERE quals during query
rewriting. That seems like a good separation to have on general
grounds anyway, but it's crucial to make this patch work properly. I
added a few more tests and this now seems to work well.

2). Having added Query->mergeJoinCondition, it then made more sense to
use that in the executor to distinguish MATCHED candidate rows from
NOT MATCHED BY SOURCE ones, rather than hacking each individual
action's quals. This avoids an additional qual check for every action.
The executor now builds 3 lists of actions (one per match kind), and
ExecMergeMatched() decides at the start which list it needs to scan,
depending on whether or not the candidate row matches the join quals.
That seems somewhat neater, and helped with the next point.

I'm not entirely happy with this though, since it means that the join
quals get checked a second time when there are NOT MATCHED BY SOURCE
actions. It would be better if it could somehow get that information
out of the underlying join node, but I'm not sure how to do that.

3). Thinking more about what to do if a concurrent update turns a
matched candidate row into a not matched one, and there are both NOT
MATCHED BY SOURCE and NOT MATCHED BY TARGET actions, I think the right
thing to do is to execute one action of each kind, as would happen if
the source and target rows had started out not matching. That's much
better than arbitrarily preferring one kind of NOT MATCHED action over
the other.

That turned out to be relatively easy to achieve -- if
ExecMergeMatched() detects a concurrent update that causes the join
quals to no longer pass when they used to, it switches from the
MATCHED list of actions to the NOT MATCHED BY SOURCE list, before
rescanning and executing the first qualifying action. Then it returns
false instead of true, to cause ExecMerge() to call
ExecMergeNotMatched(), so that it also executes a NOT MATCHED BY
TARGET action. I extended the isolation tests to test that, and the
results look quite good.

That'll need a little tweaking if MERGE gets RETURNING support, since
it won't then be able to execute two actions in a single call to the
ModifyTable node. I think that should be fairly easy to deal with
though, just by setting a flag on the node to indicate that there is a
pending NOT MATCHED BY TARGET action to execute the next time it gets
called.

Regards,
Dean

Attachment Content-Type Size
support-merge-when-not-matched-by-source-v10.patch text/x-patch 105.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2024-03-04 09:50:27 RE: Some shared memory chunks are allocated even if related processes won't start
Previous Message Amit Langote 2024-03-04 09:40:10 Re: remaining sql/json patches