Re: BUG #19536: UPDATE RETURNING OLD value is stale after concurrent update when table has a BEFORE UPDATE trigger

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: bobergj(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject: Re: BUG #19536: UPDATE RETURNING OLD value is stale after concurrent update when table has a BEFORE UPDATE trigger
Date: 2026-06-27 19:22:14
Message-ID: CALj2ACVThKXik0+X12yEz=78uzUzLZABLj0WKOgH+q-kyvdDcg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On Sat, Jun 27, 2026 at 5:54 AM PG Bug reporting form
<noreply(at)postgresql(dot)org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference: 19536
> Logged by: Jonas Boberg
> Email address: bobergj(at)gmail(dot)com
> PostgreSQL version: 18.4
> Operating system: Docker image postgres:18.4-alpine3.23
> Description:

Thanks for reporting the bug and a reproducer!

> In an UPDATE ... RETURNING OLD under isolation level READ COMMITTED, when
> the table has a BEFORE UPDATE trigger, the OLD value is stale after a
> concurrent update.
>
> SELECT version();
> > PostgreSQL 18.4 on aarch64-unknown-linux-musl, compiled by gcc (Alpine
> 15.2.0) 15.2.0, 64-bit
>
> ## How to reproduce
>
> Setup:
> -------
> CREATE TABLE t (
> id int PRIMARY KEY,
> n int NOT NULL
> );
>
> CREATE FUNCTION before_update_noop()
> RETURNS trigger
> LANGUAGE plpgsql AS $$
> BEGIN
> RETURN NEW;
> END;
> $$;
>
> CREATE TRIGGER t_before_update
> BEFORE UPDATE ON t
> FOR EACH ROW EXECUTE FUNCTION before_update_noop();
>
> INSERT INTO t VALUES (1, 7);
> -----
>
> psql session 1:
> ------
> BEGIN ISOLATION LEVEL READ COMMITTED;
>
> UPDATE t
> SET n = n + 10
> WHERE id = 1;
> -- do not commit, hold the row lock
> ------
> Leave session 1 open
>
> psql session 2:
> -----
> BEGIN ISOLATION LEVEL READ COMMITTED;
> UPDATE t
> SET n = n + 1
> WHERE id = 1
> RETURNING
> OLD.n AS old_n,
> NEW.n AS new_n,
> NEW.n - 1 AS expected_old;
> ----
> Session 2 blocks. Now commit in session 1:
> ----
> COMMIT;
> ----
>
> ## Actual result
>
> Session 2 outputs:
> old_n | new_n | expected_old
> -------+-------+--------------
> 7 | 18 | 17
> (1 row)
>
> Here OLD.n is the pre-concurrent update value, while NEW.n is the value
> written by the concurrent update.
>
> ## Expected result
>
> old_n | new_n | expected_old
> -------+-------+--------------
> 17 | 18 | 17
>
> If I drop the trigger and repeat the test, session 2 returns the expected
> result.

I reproduced it on HEAD.

Backend 1 stores its txn-id in the old row's xmax (ctid=(0,1), n=7)
and is still in progress.

Backend 2 tries to lock the same row, sees that xmax holds an
in-progress txn, and waits (ExecUpdate -> ExecUpdatePrologue ->
ExecBRUpdateTriggers -> GetTupleForTrigger -> table_tuple_lock ->
heap_lock_tuple -> XactLockTableWait).

When backend 1 commits, backend 2 wakes up. Because the lock was
requested with TUPLE_LOCK_FLAG_FIND_LAST_VERSION (set under READ
COMMITTED in GetTupleForTrigger), heap_lock_tuple walks the
row-version chain to the latest version and advances tupleid from
(ctid=(0,1), n=7) to (ctid=(0,2), n=17).

In ExecUpdate after ExecUpdatePrologue, tupleid now points at the
latest version (ctid=(0,2), n=17) and the lock is already held, so
table_tuple_update has nothing to wait on and returns TM_Ok. But the
oldSlot - the slot RETURNING clause reads from - is only refreshed on
the TM_Updated path. Since we got TM_Ok, oldSlot still holds the value
captured before waiting (ctid=(0,1), n=7). NEW is correct (ctid=(0,3),
n=18).

In short: with a BEFORE UPDATE trigger, the trigger's tuple lock
advances tupleid to the concurrently-updated row version, so the later
table_tuple_update returns TM_Ok instead of TM_Updated. Since oldSlot
is only refreshed on the TM_Updated path, the RETURNING clause keeps
the pre-wait value (ctid=(0,1), n=7) instead of the actual
concurrently-updated value (ctid=(0,2), n=17). Without the trigger,
the wait happens inside table_tuple_update itself, which returns
TM_Updated and correctly refreshes oldSlot.

I'm adding Dean Rasheed (commit 80feb727c86 v18 - Add OLD/NEW support
to RETURNING in DML queries.) for additional inputs.

--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Bill Kim 2026-06-28 02:14:35 BUG #19524: NaN handling in btree_gist's float4/float8 opclasses
Previous Message Chengpeng Yan 2026-06-27 17:00:20 Re: BUG #19532: Window run-condition optimization can produce wrong results for count() window aggregates