Re: XLogReadRecord() error in XlogReadTwoPhaseData()

From: Noah Misch <noah(at)leadboat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: XLogReadRecord() error in XlogReadTwoPhaseData()
Date: 2021-11-18 07:05:06
Message-ID: 20211118070506.GA1294133@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 17, 2021 at 05:47:10PM -0500, Tom Lane wrote:
> Noah Misch <noah(at)leadboat(dot)com> writes:
> > Each of the three failures happened on a sparc64 Debian+gcc machine. I had
> > tried ~8000 iterations on thorntail, another sparc64 Debian+gcc animal,
> > without reproducing this.

> # 'pgbench: error: client 0 script 1 aborted in command 4 query 0: ERROR: could not read two-phase state from WAL at 0/159EF88: unexpected pageaddr 0/0 in log segment 000000010000000000000001, offset 5890048
> [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tadarida&dt=2021-11-17%2013%3A01%3A24

Two others:
ERROR: could not read two-phase state from WAL at 0/16F1850: invalid record length at 0/16F1850: wanted 24, got 0
-- https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tadarida&dt=2021-11-12%2013%3A01%3A15
ERROR: could not read two-phase state from WAL at 0/1668020: incorrect resource manager data checksum in record at 0/1668020
-- https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-11-16%2015%3A00%3A52

> I suppose "unexpected pageaddr 0/0" is most easily explained by supposing
> that XlogReadTwoPhaseData tried to read a WAL page that hadn't been
> written out yet. Have we got any synchronization around that?

If the WAL address isn't on disk, that error doesn't happen. Instead,
read_local_xlog_page() blocks waiting for the WAL to become available. It's
still possible that we make the WAL region exist, but it somehow doesn't
contain the right data until shortly later. FinishPreparedTransaction() takes
TwoPhaseStateLock and looks for an entry having gxact->valid. EndPrepare()
fills gxact->prepare_end_lsn, then calls MarkAsPrepared() to set gxact->valid
under TwoPhaseStateLock. All freelist (freeGXacts) interaction holds
TwoPhaseStateLock. I'm not seeing a gap in that synchronization.

I don't have a great theory, but here are candidates to examine next:

- Run with wal_debug=on to confirm logged write location matches read location.
- Run "PGDATA=contrib/amcheck/tmp_check/t_003_cic_2pc_CIC_2PC_test_data/pgdata
pg_waldump -s 0/01000000" at the end of the test.
- Dump WAL page binary image at the point of failure.
- Log which branches in XLogReadRecord() are taken.

What else might help?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message osumi.takamichi@fujitsu.com 2021-11-18 07:22:15 RE: Optionally automatically disable logical replication subscriptions on error
Previous Message Michael Paquier 2021-11-18 06:58:18 Re: pg_upgrade test for binary compatibility of core data types