Re: Fix for visibility check on 14.5 fails on tpcc with high concurrency

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Dimos Stamatakis <dimos(dot)stamatakis(at)servicenow(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Fix for visibility check on 14.5 fails on tpcc with high concurrency
Date: 2022-11-23 09:18:13
Message-ID: 20221123091813.o475zfmp2fbyafuv@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Dimos

On 2022-Nov-22, Dimos Stamatakis wrote:

> When running tpcc on sysbench with high concurrency (96 threads, scale
> factor 5) we realized that a fix for visibility check (introduced in
> PG-14.5) causes sysbench to fail in 1 out of 70 runs.
> The error is the following:
>
> SQL error, errno = 0, state = 'XX000': new multixact has more than one updating member

Ouch.

I did not remember any reports of this. Searching I found this recent
one:
https://postgr.es/m/17518-04e368df5ad7f2ee@postgresql.org

However, the reporter there says they're using 12.10, and according to
src/tools/git_changelog the commit appeared only in 12.12:

Author: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
Branch: master Release: REL_15_BR [adf6d5dfb] 2022-06-27 08:21:08 +0300
Branch: REL_14_STABLE Release: REL_14_5 [e24615a00] 2022-06-27 08:24:30 +0300
Branch: REL_13_STABLE Release: REL_13_8 [7ba325fd7] 2022-06-27 08:24:35 +0300
Branch: REL_12_STABLE Release: REL_12_12 [af530898e] 2022-06-27 08:24:36 +0300
Branch: REL_11_STABLE Release: REL_11_17 [b49889f3c] 2022-06-27 08:24:37 +0300
Branch: REL_10_STABLE Release: REL_10_22 [4822b4627] 2022-06-27 08:24:38 +0300

Fix visibility check when XID is committed in CLOG but not in procarray.
[...]

Thinking further, one problem in tracking this down is that at this
point the multixact in question is *being created*, so we don't have a
WAL trail we could trace through.

I suggest that we could improve that elog() so that it includes the
members of the multixact in question, which could help us better
understand what is going on.

> This commit was supposed to fix a race condition during the visibility
> check. Please let us know whether you are aware of this issue and if
> there is a quick fix.

I don't think so.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2022-11-23 09:19:27 Re: Non-decimal integer literals
Previous Message Boboc Cristi 2022-11-23 09:03:05 Re: Logical replication missing information