Re: BUG #16894: PANIC: WAL contains references to invalid pages

From: David Steele <david(at)pgmasters(dot)net>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Антон Курочкин <antkurochkin(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16894: PANIC: WAL contains references to invalid pages
Date: 2021-03-05 12:55:43
Message-ID: fcabffa9-2eb3-0153-c8bc-40516e483338@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 3/4/21 11:32 PM, Michael Paquier wrote:
> On Wed, Mar 03, 2021 at 06:49:02AM -0500, David Steele wrote:
>> OK, but shouldn't we have a full page write for this page after the backup
>> starts, rather than the partial we seem to be seeing here? Is there any
>> condition where the full page write would be skipped legitimately, or does
>> it point to a problem?
>
> That's how things work. If they don't work this way for physical
> backups, we may have a problem. At replay, the full page will be
> replayed if BKPIMAGE_APPLY is correctly set, as per
> XLogReadBufferForRedoExtended(). And XLogRecordAssemble()
> does the decision when building the record (just grep for
> needs_backup).

That's exactly the problem. This WAL record appears to be the only
reference to the page the submitter could find in their WAL. Anton,
could you confirm that?

If we can confirm that there is no FPI for this page after the initial
backup checkpoint, wouldn't that point to an issue?

>> If Postgres is running correctly there is certainly no expectation for
>> support of this unusual use case, but I do think that this possibly points
>> to an issue in Postgres, which under normal circumstances would be very hard
>> to detect.
>
> Well, the report tells that this is an issue that happens on those
> fake files full of zeros, but are you sure that you have the sizing
> right? I still don't see any evidence of anything broken based on the
> information gathered for full backups, FWIW.

This is the size we validate for normal restores so pretty sure it is
accurate. Actually, postgres doesn't seem to care if the file is large
enough as long as it is present. The reason we precreate the files at
the correct size is because when postgres extends the file it is not
sparse and uses more space.

Regards,
--
-David
david(at)pgmasters(dot)net

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alvaro Herrera 2021-03-05 13:43:31 Re: BUG #16914: Regression test of the worker_spi fails if USE_MODULE_DB environment variable is set.
Previous Message Michael Paquier 2021-03-05 06:19:26 Re: BUG #16914: Regression test of the worker_spi fails if USE_MODULE_DB environment variable is set.