| From: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
|---|---|
| To: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
| Cc: | bobergj(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
| Subject: | Re: BUG #19536: UPDATE RETURNING OLD value is stale after concurrent update when table has a BEFORE UPDATE trigger |
| Date: | 2026-06-28 23:34:59 |
| Message-ID: | CALj2ACXYQrazMMXe_2bjfFx3DLOb0BDJ4hbx+1eprasX3x_+0A@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-bugs |
Hi,
On Sun, Jun 28, 2026 at 2:59 AM Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
>
> On Sat, 27 Jun 2026 at 20:22, Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > In short: with a BEFORE UPDATE trigger, the trigger's tuple lock
> > advances tupleid to the concurrently-updated row version, so the later
> > table_tuple_update returns TM_Ok instead of TM_Updated. Since oldSlot
> > is only refreshed on the TM_Updated path, the RETURNING clause keeps
> > the pre-wait value (ctid=(0,1), n=7) instead of the actual
> > concurrently-updated value (ctid=(0,2), n=17). Without the trigger,
> > the wait happens inside table_tuple_update itself, which returns
> > TM_Updated and correctly refreshes oldSlot.
>
> Yes, that analysis seems correct.
Thanks for the patch! This matches the fix I had in mind after my
analysis as well.
A couple of comments:
1/ I think the refetch (which is only needed when RETURNING references
the OLD tuple) seems unnecessary. Fetching the row version comes with
an additional buffer pool lookup (possibly a cache miss), a buffer
pin, etc. Could we instead reuse the tuple that GetTupleForTrigger has
already fetched? I might be overthinking the performance overhead
here, but on busy production systems we can't ignore these costs.
Others may have a different take.
2/
+ context->tmfd.traversed = false;
if (!ExecUpdatePrologue(context, resultRelInfo, tupleid, oldtuple,
slot, NULL))
return NULL;
Setting traversed to false here seems a bit off. Is this needed
because the caller doesn't initialize ModifyTableContext properly?
3/
> This doesn't affect DELETE, because the DELETE code always fetches the
> most recent version of the old tuple just before processing the
> RETURNING clause.
>
> Similarly, it doesn't affect a cross-partition
> UPDATE, which does a DELETE followed by an INSERT.
>
> It also doesn't affect MERGE UPDATE/DELETE, because that does its own
> EPQ handling, rather than relying on the trigger code to do it (see
> 9321c79c86e).
I haven't checked these paths in depth, but I think it's worth adding
tests for all of them.
4/
+ if (context->tmfd.traversed)
+ {
+ if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
+ tupleid,
+ SnapshotAny,
+ oldSlot))
+ elog(ERROR, "failed to fetch tuple being updated");
+ }
Could we add some context to the error message, e.g., mentioning that
this happens after processing the BEFORE UPDATE row triggers? That
would make it easier to distinguish from the other similar ones.
--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Richard Guo | 2026-06-29 00:39:20 | Re: BUG #19533: Wrong results from WindowAgg run-condition pushdown on count() with EXCLUDE CURRENT ROW |
| Previous Message | Daniel Gustafsson | 2026-06-28 18:54:02 | Re: BUG #19524: NaN handling in btree_gist's float4/float8 opclasses |