From: | Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> |
---|---|
To: | Dmitry <dsy(dot)075(at)yandex(dot)ru> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Inconsistent update in the MERGE command |
Date: | 2025-08-24 17:34:04 |
Message-ID: | 20250825023404.06f96a85bd806b587ca28d3d@sraoss.co.jp |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 21 Aug 2025 16:30:09 +0300
Dmitry <dsy(dot)075(at)yandex(dot)ru> wrote:
> Hi hackers,
>
> I noticed an inconsistent update when executing MERGE commands, which
> looks more like a bug.
> In my test example, the value of 'val' should increase in an ascending
> monotonous sequence.
>
> Test system
> ===========
> - Architecture: x86_64
> - OS: Ubuntu 24.04.3 LTS (Noble Numbat)
> - Tested postgres version(s):
> - latest 17 (17.6)
>
> Steps to reproduce
> ==================
> postgres=# create table t_merge (id int primary key, val int);
> CREATE TABLE
> postgres=# create table t_merge_chk (val int primary key);
> CREATE TABLE
> postgres=# insert into t_merge values (1,0);
> INSERT 0 1
>
> pgbench --no-vacuum --exit-on-abort -c 10 --file=/dev/stdin <<'EOF'
> begin;
> merge into t_merge t
> using (select 1 id) s on (t.id = s.id)
> when matched then update set val = t.val + 1
> returning val \gset
>
> -- Checking the uniqueness of a value
> insert into t_merge_chk (val) values (:val);
> commit;
> EOF
>
> pgbench: error: client 3 script 0 aborted in command 2 query 0:
> ERROR: duplicate key value violates unique constraint "t_merge_chk_pkey"
> DETAIL: Key (val)=(2) already exists.
>
>
> What do you think about this?
I confirmed this issue by executing the following query concurrently
in three transactions. (With only two transactions, the issue does not occur.)
merge into t_merge t
using (select 1 id) s on (t.id = s.id)
when matched then update set val = t.val + 1
returning old.ctid,old.val,new.ctid,new.val;
Theoretically, each transaction should increment the val value
independently, so the total result should be an increment of three.
However, sometimes (not always, but often) the second and third committed
transactions end up producing the same result.
For example, when the first committed transaction returns:
ctid | val | ctid | val
---------+-----+---------+-----
(2,158) | 4 | (2,159) | 5
(1 row)
the second committed transaction could return:
ctid | val | ctid | val
---------+-----+---------+-----
(2,159) | 5 | (2,160) | 6
(1 row)
and the third committed transaction could return:
ctid | val | ctid | val
---------+-----+---------+-----
(2,159) | 5 | (2,161) | 6
(1 row)
This is wrong, since the old.val and new.val should be 6 and 7, respectively.
I don't completely understand how this race condition occurs,
but I believe the bug is due to the misuse of TM_FailureData
returned by table_tuple_lock in ExecMergeMatched().
Currently, TM_FailureData.ctid is used as a reference to the
latest version of oldtuple, but this is not always correct.
Instead, the tupleid passed to table_tuple_lock should be used.
I've attached a patch to fix this.
Regards,
Yugo Nagata
--
Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>
Attachment | Content-Type | Size |
---|---|---|
0001-Fix-misuse-of-TM_FailureData.ctid-in-ExecMergeMatche.patch | text/x-diff | 1.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Konstantin Knizhnik | 2025-08-24 18:11:36 | Re: Non-reproducible AIO failure |
Previous Message | Mihail Nikalayeu | 2025-08-24 16:52:00 | Re: Adding REPACK [concurrently] |