Re: BUG #19536: UPDATE RETURNING OLD value is stale after concurrent update when table has a BEFORE UPDATE trigger

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: bobergj(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #19536: UPDATE RETURNING OLD value is stale after concurrent update when table has a BEFORE UPDATE trigger
Date: 2026-06-28 23:34:59
Message-ID: CALj2ACXYQrazMMXe_2bjfFx3DLOb0BDJ4hbx+1eprasX3x_+0A@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On Sun, Jun 28, 2026 at 2:59 AM Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
>
> On Sat, 27 Jun 2026 at 20:22, Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > In short: with a BEFORE UPDATE trigger, the trigger's tuple lock
> > advances tupleid to the concurrently-updated row version, so the later
> > table_tuple_update returns TM_Ok instead of TM_Updated. Since oldSlot
> > is only refreshed on the TM_Updated path, the RETURNING clause keeps
> > the pre-wait value (ctid=(0,1), n=7) instead of the actual
> > concurrently-updated value (ctid=(0,2), n=17). Without the trigger,
> > the wait happens inside table_tuple_update itself, which returns
> > TM_Updated and correctly refreshes oldSlot.
>
> Yes, that analysis seems correct.

Thanks for the patch! This matches the fix I had in mind after my
analysis as well.

A couple of comments:

1/ I think the refetch (which is only needed when RETURNING references
the OLD tuple) seems unnecessary. Fetching the row version comes with
an additional buffer pool lookup (possibly a cache miss), a buffer
pin, etc. Could we instead reuse the tuple that GetTupleForTrigger has
already fetched? I might be overthinking the performance overhead
here, but on busy production systems we can't ignore these costs.
Others may have a different take.

2/
+ context->tmfd.traversed = false;
if (!ExecUpdatePrologue(context, resultRelInfo, tupleid, oldtuple,
slot, NULL))
return NULL;

Setting traversed to false here seems a bit off. Is this needed
because the caller doesn't initialize ModifyTableContext properly?

3/
> This doesn't affect DELETE, because the DELETE code always fetches the
> most recent version of the old tuple just before processing the
> RETURNING clause.
>
> Similarly, it doesn't affect a cross-partition
> UPDATE, which does a DELETE followed by an INSERT.
>
> It also doesn't affect MERGE UPDATE/DELETE, because that does its own
> EPQ handling, rather than relying on the trigger code to do it (see
> 9321c79c86e).

I haven't checked these paths in depth, but I think it's worth adding
tests for all of them.

4/
+ if (context->tmfd.traversed)
+ {
+ if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
+ tupleid,
+ SnapshotAny,
+ oldSlot))
+ elog(ERROR, "failed to fetch tuple being updated");
+ }

Could we add some context to the error message, e.g., mentioning that
this happens after processing the BEFORE UPDATE row triggers? That
would make it easier to distinguish from the other similar ones.

--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Richard Guo 2026-06-29 00:39:20 Re: BUG #19533: Wrong results from WindowAgg run-condition pushdown on count() with EXCLUDE CURRENT ROW
Previous Message Daniel Gustafsson 2026-06-28 18:54:02 Re: BUG #19524: NaN handling in btree_gist's float4/float8 opclasses