Re: pg15b3: recovery fails with wal prefetch enabled

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: "Shinoda, Noriyoshi (PN Japan FSIP)" <noriyoshi(dot)shinoda(at)hpe(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg15b3: recovery fails with wal prefetch enabled
Date: 2022-09-01 00:52:39
Message-ID: 20220901005239.GC31833@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 01, 2022 at 12:05:36PM +1200, Thomas Munro wrote:
> On Thu, Sep 1, 2022 at 2:01 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> > < 2022-08-31 08:44:10.495 CDT >LOG: checkpoint starting: end-of-recovery immediate wait
> > < 2022-08-31 08:44:10.609 CDT >LOG: request to flush past end of generated WAL; request 1201/1CAF84F0, current position 1201/1CADB730
> > < 2022-08-31 08:44:10.609 CDT >CONTEXT: writing block 0 of relation base/16881/2840_vm
> > < 2022-08-31 08:44:10.609 CDT >ERROR: xlog flush request 1201/1CAF84F0 is not satisfied --- flushed only to 1201/1CADB730
> > < 2022-08-31 08:44:10.609 CDT >CONTEXT: writing block 0 of relation base/16881/2840_vm
> > < 2022-08-31 08:44:10.609 CDT >FATAL: checkpoint request failed
> >
> > I was able to start it with -c recovery_prefetch=no, so it seems like
> > prefetch tried to do too much. The VM runs centos7 under qemu.
> > I'm making a copy of the data dir in cases it's needed.
>
> Hmm, a page with an LSN set 118208 bytes past the end of WAL. It's a
> vm fork page (which recovery prefetch should ignore completely). Did
> you happen to get a copy before the successful recovery? After the
> successful recovery, what LSN does that page have, and can you find
> the references to it in the WAL with eg pg_waldump -R 1663/16681/2840
> -F vm? Have you turned FPW off (perhaps this is on ZFS?)?

Yes, I have a copy that reproduces the issue:

#1 0x00000000009a0df4 in errfinish (filename=<optimized out>, filename(at)entry=0xa15535 "xlog.c", lineno=lineno(at)entry=2671, funcname=funcname(at)entry=0xa1da27 <__func__.22763> "XLogFlush") at elog.c:588
#2 0x000000000055f1cf in XLogFlush (record=19795985532144) at xlog.c:2668
#3 0x0000000000813b24 in FlushBuffer (buf=0x7fffdf1f8900, reln=<optimized out>) at bufmgr.c:2889
#4 0x0000000000817a5b in SyncOneBuffer (buf_id=buf_id(at)entry=7796, skip_recently_used=skip_recently_used(at)entry=false, wb_context=wb_context(at)entry=0x7fffffffcdf0) at bufmgr.c:2576
#5 0x00000000008181c2 in BufferSync (flags=flags(at)entry=358) at bufmgr.c:2164
#6 0x00000000008182f5 in CheckPointBuffers (flags=flags(at)entry=358) at bufmgr.c:2743
#7 0x00000000005587b2 in CheckPointGuts (checkPointRedo=19795985413936, flags=flags(at)entry=358) at xlog.c:6855
#8 0x000000000055feb3 in CreateCheckPoint (flags=flags(at)entry=358) at xlog.c:6534
#9 0x00000000007aceaa in CheckpointerMain () at checkpointer.c:455
#10 0x00000000007aac52 in AuxiliaryProcessMain (auxtype=auxtype(at)entry=CheckpointerProcess) at auxprocess.c:153
#11 0x00000000007b0bd8 in StartChildProcess (type=<optimized out>) at postmaster.c:5430
#12 0x00000000007b388f in PostmasterMain (argc=argc(at)entry=7, argv=argv(at)entry=0xf139e0) at postmaster.c:1463
#13 0x00000000004986a6 in main (argc=7, argv=0xf139e0) at main.c:202

It's not on zfs, and FPW have never been turned off.

I should add that this instance has been pg_upgraded since v10.

BTW, base/16881 is the postgres DB )which has 43GB of logfiles imported from
CSV, plus 2GB of snapshots of pg_control_checkpoint, pg_settings,
pg_stat_bgwriter, pg_stat_database, pg_stat_wal).

postgres=# SELECT * FROM page_header(get_raw_page('pg_toast.pg_toast_2619', 'main', 0));
lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid
---------------+----------+-------+-------+-------+---------+----------+---------+------------
1201/1CDD1F98 | -6200 | 1 | 44 | 424 | 8192 | 8192 | 4 | 3681043287
(1 fila)

postgres=# SELECT * FROM page_header(get_raw_page('pg_toast.pg_toast_2619', 'vm', 0));
lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid
---------------+----------+-------+-------+-------+---------+----------+---------+-----------
1201/1CAF84F0 | -6010 | 0 | 24 | 8192 | 8192 | 8192 | 4 | 0

I found this in waldump (note that you had a typoe - it's 16881).

[pryzbyj(at)template0 ~]$ sudo /usr/pgsql-15/bin/pg_waldump -R 1663/16881/2840 -F vm -p /mnt/tmp/15/data/pg_wal 00000001000012010000001C
rmgr: Heap2 len (rec/tot): 64/ 122, tx: 0, lsn: 1201/1CAF2658, prev 1201/1CAF2618, desc: VISIBLE cutoff xid 3681024856 flags 0x01, blkref #0: rel 1663/16881/2840 fork vm blk 0 FPW, blkref #1: rel 1663/16881/2840 blk 54
rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 1201/1CAF3AF8, prev 1201/1CAF2788, desc: VISIBLE cutoff xid 2 flags 0x03, blkref #0: rel 1663/16881/2840 fork vm blk 0, blkref #1: rel 1663/16881/2840 blk 0
rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 1201/1CAF3B70, prev 1201/1CAF3B38, desc: VISIBLE cutoff xid 3671427998 flags 0x01, blkref #0: rel 1663/16881/2840 fork vm blk 0, blkref #1: rel 1663/16881/2840 blk 2
rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 1201/1CAF4DC8, prev 1201/1CAF3BB0, desc: VISIBLE cutoff xid 3672889900 flags 0x01, blkref #0: rel 1663/16881/2840 fork vm blk 0, blkref #1: rel 1663/16881/2840 blk 4
rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 1201/1CAF5FB0, prev 1201/1CAF4E08, desc: VISIBLE cutoff xid 3679743844 flags 0x01, blkref #0: rel 1663/16881/2840 fork vm blk 0, blkref #1: rel 1663/16881/2840 blk 5
rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 1201/1CAF7320, prev 1201/1CAF5FF0, desc: VISIBLE cutoff xid 3679743844 flags 0x01, blkref #0: rel 1663/16881/2840 fork vm blk 0, blkref #1: rel 1663/16881/2840 blk 6
rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 1201/1CAF7398, prev 1201/1CAF7360, desc: VISIBLE cutoff xid 3679751919 flags 0x01, blkref #0: rel 1663/16881/2840 fork vm blk 0, blkref #1: rel 1663/16881/2840 blk 11
rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 1201/1CAF7410, prev 1201/1CAF73D8, desc: VISIBLE cutoff xid 2 flags 0x03, blkref #0: rel 1663/16881/2840 fork vm blk 0, blkref #1: rel 1663/16881/2840 blk 17
rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 1201/1CAF7488, prev 1201/1CAF7450, desc: VISIBLE cutoff xid 2 flags 0x03, blkref #0: rel 1663/16881/2840 fork vm blk 0, blkref #1: rel 1663/16881/2840 blk 19
rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 1201/1CAF7500, prev 1201/1CAF74C8, desc: VISIBLE cutoff xid 3645406844 flags 0x01, blkref #0: rel 1663/16881/2840 fork vm blk 0, blkref #1: rel 1663/16881/2840 blk 23
rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 1201/1CAF7578, prev 1201/1CAF7540, desc: VISIBLE cutoff xid 3669978567 flags 0x01, blkref #0: rel 1663/16881/2840 fork vm blk 0, blkref #1: rel 1663/16881/2840 blk 24
rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 1201/1CAF75F0, prev 1201/1CAF75B8, desc: VISIBLE cutoff xid 0 flags 0x03, blkref #0: rel 1663/16881/2840 fork vm blk 0, blkref #1: rel 1663/16881/2840 blk 25
rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 1201/1CAF7668, prev 1201/1CAF7630, desc: VISIBLE cutoff xid 3681024856 flags 0x01, blkref #0: rel 1663/16881/2840 fork vm blk 0, blkref #1: rel 1663/16881/2840 blk 26
rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 1201/1CAF76E0, prev 1201/1CAF76A8, desc: VISIBLE cutoff xid 3671911724 flags 0x01, blkref #0: rel 1663/16881/2840 fork vm blk 0, blkref #1: rel 1663/16881/2840 blk 27
rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 1201/1CAF7758, prev 1201/1CAF7720, desc: VISIBLE cutoff xid 2 flags 0x03, blkref #0: rel 1663/16881/2840 fork vm blk 0, blkref #1: rel 1663/16881/2840 blk 34
rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 1201/1CAF77D0, prev 1201/1CAF7798, desc: VISIBLE cutoff xid 2 flags 0x03, blkref #0: rel 1663/16881/2840 fork vm blk 0, blkref #1: rel 1663/16881/2840 blk 35
rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 1201/1CAF7EF8, prev 1201/1CAF7810, desc: VISIBLE cutoff xid 3672408544 flags 0x01, blkref #0: rel 1663/16881/2840 fork vm blk 0, blkref #1: rel 1663/16881/2840 blk 37
rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 1201/1CAF7F70, prev 1201/1CAF7F38, desc: VISIBLE cutoff xid 2 flags 0x03, blkref #0: rel 1663/16881/2840 fork vm blk 0, blkref #1: rel 1663/16881/2840 blk 38
rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 1201/1CAF7FE8, prev 1201/1CAF7FB0, desc: VISIBLE cutoff xid 2 flags 0x03, blkref #0: rel 1663/16881/2840 fork vm blk 0, blkref #1: rel 1663/16881/2840 blk 39
rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 1201/1CAF8078, prev 1201/1CAF8040, desc: VISIBLE cutoff xid 3678237783 flags 0x01, blkref #0: rel 1663/16881/2840 fork vm blk 0, blkref #1: rel 1663/16881/2840 blk 41
rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 1201/1CAF80F0, prev 1201/1CAF80B8, desc: VISIBLE cutoff xid 3672408544 flags 0x01, blkref #0: rel 1663/16881/2840 fork vm blk 0, blkref #1: rel 1663/16881/2840 blk 42
rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 1201/1CAF8168, prev 1201/1CAF8130, desc: VISIBLE cutoff xid 3680789266 flags 0x01, blkref #0: rel 1663/16881/2840 fork vm blk 0, blkref #1: rel 1663/16881/2840 blk 43
rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 1201/1CAF81E0, prev 1201/1CAF81A8, desc: VISIBLE cutoff xid 3667994218 flags 0x01, blkref #0: rel 1663/16881/2840 fork vm blk 0, blkref #1: rel 1663/16881/2840 blk 44
rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 1201/1CAF8258, prev 1201/1CAF8220, desc: VISIBLE cutoff xid 3680789266 flags 0x01, blkref #0: rel 1663/16881/2840 fork vm blk 0, blkref #1: rel 1663/16881/2840 blk 45
rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 1201/1CAF82D0, prev 1201/1CAF8298, desc: VISIBLE cutoff xid 3673830395 flags 0x01, blkref #0: rel 1663/16881/2840 fork vm blk 0, blkref #1: rel 1663/16881/2840 blk 48
rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 1201/1CAF8348, prev 1201/1CAF8310, desc: VISIBLE cutoff xid 0 flags 0x03, blkref #0: rel 1663/16881/2840 fork vm blk 0, blkref #1: rel 1663/16881/2840 blk 50
rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 1201/1CAF83C0, prev 1201/1CAF8388, desc: VISIBLE cutoff xid 3681024856 flags 0x01, blkref #0: rel 1663/16881/2840 fork vm blk 0, blkref #1: rel 1663/16881/2840 blk 51
rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 1201/1CAF8438, prev 1201/1CAF8400, desc: VISIBLE cutoff xid 3681024856 flags 0x01, blkref #0: rel 1663/16881/2840 fork vm blk 0, blkref #1: rel 1663/16881/2840 blk 52
rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 1201/1CAF84B0, prev 1201/1CAF8478, desc: VISIBLE cutoff xid 3678741092 flags 0x01, blkref #0: rel 1663/16881/2840 fork vm blk 0, blkref #1: rel 1663/16881/2840 blk 53
pg_waldump: error: error en registro de WAL en 1201/1CD90E48: invalid record length at 1201/1CD91010: wanted 24, got 0

I could send our WAL to you if that's desirable ..

--
Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-09-01 00:57:22 Re: Solaris "sed" versus pre-v13 plpython tests
Previous Message Alexander Korotkov 2022-09-01 00:51:45 Re: POC: Better infrastructure for automated testing of concurrency issues