Re: pg15b3: recovery fails with wal prefetch enabled

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: "Shinoda, Noriyoshi (PN Japan FSIP)" <noriyoshi(dot)shinoda(at)hpe(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg15b3: recovery fails with wal prefetch enabled
Date: 2022-09-01 00:05:36
Message-ID: CA+hUKGKPG=yeV8=55KrhrTHBXROvH7z8i_vwxUw4KkrY2hkH7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 1, 2022 at 2:01 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> < 2022-08-31 08:44:10.495 CDT >LOG: checkpoint starting: end-of-recovery immediate wait
> < 2022-08-31 08:44:10.609 CDT >LOG: request to flush past end of generated WAL; request 1201/1CAF84F0, current position 1201/1CADB730
> < 2022-08-31 08:44:10.609 CDT >CONTEXT: writing block 0 of relation base/16881/2840_vm
> < 2022-08-31 08:44:10.609 CDT >ERROR: xlog flush request 1201/1CAF84F0 is not satisfied --- flushed only to 1201/1CADB730
> < 2022-08-31 08:44:10.609 CDT >CONTEXT: writing block 0 of relation base/16881/2840_vm
> < 2022-08-31 08:44:10.609 CDT >FATAL: checkpoint request failed
>
> I was able to start it with -c recovery_prefetch=no, so it seems like
> prefetch tried to do too much. The VM runs centos7 under qemu.
> I'm making a copy of the data dir in cases it's needed.

Hmm, a page with an LSN set 118208 bytes past the end of WAL. It's a
vm fork page (which recovery prefetch should ignore completely). Did
you happen to get a copy before the successful recovery? After the
successful recovery, what LSN does that page have, and can you find
the references to it in the WAL with eg pg_waldump -R 1663/16681/2840
-F vm? Have you turned FPW off (perhaps this is on ZFS?)?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2022-09-01 00:12:20 Re: Reducing the chunk header sizes on all memory context types
Previous Message Bruce Momjian 2022-08-31 23:45:34 Re: Inconsistent error message for varchar(n)