Re: BUG #17928: Standby fails to decode WAL on termination of primary

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Sergei Kornilov <sk(at)zsrv(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, pgbf(at)twiska(dot)com
Subject: Re: BUG #17928: Standby fails to decode WAL on termination of primary
Date: 2023-09-25 04:59:09
Message-ID: 86521.1695617949@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> On Mon, Sep 25, 2023 at 4:07 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> That one is mine, so let me know if there's something particular
>> you'd like me to check.

> Thanks! But after spending the best part of a day installing and
> compiling stuff at snail's pace, I have just this moment managed to
> reproduce this reliably on an emulator running Debian armhf (= armv7).
> So hopefully I can track the issue down from here without bothering
> build farm owners.

FWIW, here's what I see on mamba:

Program terminated with signal SIGSEGV, Segmentation fault.
#0 pg_comp_crc32c_sb8 (crc=3197562372, crc(at)entry=4294967295,
data=data(at)entry=0xfda6e030, len=<optimized out>) at pg_crc32c_sb8.c:56
56 uint32 a = *p4++ ^ crc;
(gdb) bt
#0 pg_comp_crc32c_sb8 (crc=3197562372, crc(at)entry=4294967295,
data=data(at)entry=0xfda6e030, len=<optimized out>) at pg_crc32c_sb8.c:56
#1 0x0195ae5c in ValidXLogRecord (state=0xfde5cc88, record=0xfda6e018,
recptr=<optimized out>) at xlogreader.c:1195
#2 0x0195cf18 in XLogDecodeNextRecord (state=state(at)entry=0xfde5cc88,
nonblocking=<optimized out>) at xlogreader.c:842
#3 0x0195d0c8 in XLogReadAhead (state=0xfde5cc88,
nonblocking=nonblocking(at)entry=false) at xlogreader.c:969
#4 0x01959760 in XLogPrefetcherNextBlock (pgsr_private=4259700408,
lsn=0xfda6b8a0) at xlogprefetcher.c:496
#5 0x0195a72c in lrq_prefetch (lrq=<optimized out>) at xlogprefetcher.c:256
#6 lrq_complete_lsn (lsn=50520048, lrq=0xfda6b858) at xlogprefetcher.c:294
#7 XLogPrefetcherReadRecord (prefetcher=prefetcher(at)entry=0xfde5deb8,
errmsg=errmsg(at)entry=0xffffdd88) at xlogprefetcher.c:1041
#8 0x01960054 in ReadRecord (xlogprefetcher=0xfde5deb8, emode=emode(at)entry=15,
fetching_ckpt=fetching_ckpt(at)entry=false, replayTLI=replayTLI(at)entry=3)
at xlogrecovery.c:3067
#9 0x01963188 in PerformWalRecovery () at xlogrecovery.c:1756
#10 0x01952594 in StartupXLOG () at xlog.c:5470
#11 0x01c26e3c in StartupProcessMain () at startup.c:267
#12 0x01c1b504 in AuxiliaryProcessMain (auxtype=auxtype(at)entry=StartupProcess)
at auxprocess.c:141
#13 0x01c2276c in StartChildProcess (type=StartupProcess) at postmaster.c:5347
#14 0x01c260ec in PostmasterMain (argc=argc(at)entry=4,
argv=argv(at)entry=0xffffe1e4) at postmaster.c:1457
#15 0x01ee39d8 in main (argc=4, argv=0xffffe1e4) at main.c:198

(gdb) f 1
#1 0x0195ae5c in ValidXLogRecord (state=0xfde5cc88, record=0xfda6e018,
recptr=<optimized out>) at xlogreader.c:1195
1195 COMP_CRC32C(crc, ((char *) record) + SizeOfXLogRecord, record->xl_tot_len - SizeOfXLogRecord);
(gdb) p *record
$3 = {xl_tot_len = 0, xl_xid = 0, xl_prev = 0, xl_info = 0 '\000',
xl_rmid = 0 '\000', xl_crc = 0}

Sure looks like ValidXLogRecord is assuming that record->xl_tot_len
can be trusted without reservation.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Thomas Munro 2023-09-25 05:05:03 Re: BUG #17928: Standby fails to decode WAL on termination of primary
Previous Message Thomas Munro 2023-09-25 03:47:12 Re: BUG #17928: Standby fails to decode WAL on termination of primary