Re: Fix for visibility check on 14.5 fails on tpcc with high concurrency

From: Dimos Stamatakis <dimos(dot)stamatakis(at)servicenow(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, "simon(dot)riggs(at)enterprisedb(dot)com" <simon(dot)riggs(at)enterprisedb(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Fix for visibility check on 14.5 fails on tpcc with high concurrency
Date: 2022-11-24 14:23:33
Message-ID: CO2PR0801MB231058F0579FB6762CECCAF4E20F9@CO2PR0801MB2310.namprd08.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for your feedback!
I applied the patch to print more information about the error. Here’s what I got:

2022-11-23 20:33:03 UTC [638 test_database]: [5458] ERROR: new multixact has more than one updating member: 0 2[248477 (nokeyupd), 248645 (nokeyupd)]
2022-11-23 20:33:03 UTC [638 test_database]: [5459] STATEMENT: UPDATE warehouse1
SET w_ytd = w_ytd + 498
WHERE w_id = 5

I then inspected the WAL and I found the log records for these 2 transactions:


rmgr: MultiXact len (rec/tot): 54/ 54, tx: 248477, lsn: 0/66DB82A8, prev 0/66DB8260, desc: CREATE_ID 133 offset 265 nmembers 2: 248477 (nokeyupd) 248500 (keysh)
rmgr: Heap len (rec/tot): 70/ 70, tx: 248477, lsn: 0/66DB82E0, prev 0/66DB82A8, desc: HOT_UPDATE off 20 xmax 133 flags 0x20 IS_MULTI EXCL_LOCK ; new off 59 xmax 132, blkref #0: rel 1663/16384/16385 blk 422
rmgr: Transaction len (rec/tot): 34/ 34, tx: 248477, lsn: 0/66DBA710, prev 0/66DBA6D0, desc: ABORT 2022-11-23 20:33:03.712298 UTC

rmgr: Transaction len (rec/tot): 34/ 34, tx: 248645, lsn: 0/66DBB060, prev 0/66DBB020, desc: ABORT 2022-11-23 20:33:03.712388 UTC

Attached please find the relevant portion of the WAL.
Thanks for your help on this!

Dimos
[ServiceNow]

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Date: Thursday, 24. November 2022 at 10:49
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Dimos Stamatakis <dimos(dot)stamatakis(at)servicenow(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Fix for visibility check on 14.5 fails on tpcc with high concurrency
[External Email]

On 2022-Nov-23, Peter Geoghegan wrote:

> On Wed, Nov 23, 2022 at 2:54 AM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> > Something like the attached. It would result in output like this:
> > WARNING: new multixact has more than one updating member: 0 2[17378 (keysh), 17381 (nokeyupd)]
> >
> > Then it should be possible to trace (in pg_waldump output) the
> > operations of each of the transactions that have any status in the
> > multixact that includes some form of "upd".
>
> That seems very useful.

Okay, pushed to all branches.

> Separately, I wonder if it would make sense to add additional
> defensive checks to FreezeMultiXactId() for this. There is an
> assertion that should catch the presence of multiple updaters in a
> single Multi when it looks like we have to generate a new Multi to
> carry the XID members forward (typically something we only need to do
> during a VACUUM FREEZE). We could at least make that
> "Assert(!TransactionIdIsValid(update_xid));" line into a defensive
> "can't happen" ereport(). It couldn't hurt, at least -- we already
> have a similar relfrozenxid check nearby, added after the "freeze the
> dead" bug was fixed.

Hmm, agreed. I'll see about that separately.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/<https://www.EnterpriseDB.com>
"I'm always right, but sometimes I'm more right than other times."
(Linus Torvalds)

Attachment Content-Type Size
multixact_wal.txt text/plain 4.1 MB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2022-11-24 14:24:21 Re: TAP output format in pg_regress
Previous Message Daniel Gustafsson 2022-11-24 14:11:27 Re: indentation in _hash_pgaddtup()