Fix for visibility check on 14.5 fails on tpcc with high concurrency

From: Dimos Stamatakis <dimos(dot)stamatakis(at)servicenow(dot)com>
To: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Fix for visibility check on 14.5 fails on tpcc with high concurrency
Date: 2022-11-22 11:38:14
Message-ID: CO2PR0801MB2310579F65529380A4E5EDC0E20A9@CO2PR0801MB2310.namprd08.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

When running tpcc on sysbench with high concurrency (96 threads, scale factor 5) we realized that a fix for visibility check (introduced in PG-14.5) causes sysbench to fail in 1 out of 70 runs.
The error is the following:

SQL error, errno = 0, state = 'XX000': new multixact has more than one updating member

And it is caused by the following statement:

UPDATE warehouse1
SET w_ytd = w_ytd + 234
WHERE w_id = 3;

The commit that fixes the visibility check is the following:
https://github.com/postgres/postgres/commit/e24615a0057a9932904317576cf5c4d42349b363

We reverted this commit and tpcc does not fail anymore, proving that this change is problematic.
Steps to reproduce:
1. Install sysbench
https://github.com/akopytov/sysbench
2. Install percona sysbench TPCC
https://github.com/Percona-Lab/sysbench-tpcc
3. Run percona sysbench -- prepare
# sysbench-tpcc/tpcc.lua --pgsql-host=localhost --pgsql-port=5432 --pgsql-user={USER} --pgsql-password={PASSWORD} --pgsql-db=test_database --db-driver=pgsql --tables=1 --threads=96 --scale=5 --time=60 prepare
4. Run percona sysbench -- run
# sysbench-tpcc/tpcc.lua --pgsql-host=localhost --pgsql-port=5432 --pgsql-user={USER} --pgsql-password={PASSWORD} --pgsql-db=test_database --db-driver=pgsql --tables=1 --report-interval=1 --rand-seed=1 --threads=96 --scale=5 --time=60 run

We tested on a machine with 2 NUMA nodes, 16 physical cores per node, and 2 threads per core, resulting in 64 threads total. The total memory is 376GB.
Attached please find the configuration file we used (postgresql.conf).

This commit was supposed to fix a race condition during the visibility check. Please let us know whether you are aware of this issue and if there is a quick fix.
Any input is highly appreciated.

Thanks,
Dimos
[ServiceNow]

Attachment Content-Type Size
postgresql.conf application/octet-stream 1.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-11-22 12:22:31 Re: Logical Replication Custom Column Expression
Previous Message Amit Kapila 2022-11-22 11:32:35 Re: Fix comments atop pg_get_replication_slots