Re: pg16: invalid page/page verification failed

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pg16: invalid page/page verification failed
Date: 2023-10-06 21:36:20
Message-ID: ZSB91GdWEew71XeD@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 06, 2023 at 08:47:39AM -0700, Andres Freund wrote:
> Hi,
>
> On 2023-10-06 09:20:05 +0900, Michael Paquier wrote:
> > On Thu, Oct 05, 2023 at 11:45:18AM -0500, Justin Pryzby wrote:
> > > This table is what it sounds like: a partition into which CSV logs are
> > > COPY'ed. It would've been created around 8am. There's no special
> > > params set for the table nor for autovacuum.
> >
> > This may be an important bit of information. 31966b151e6a is new as
> > of Postgres 16, has changed the way relations are extended and COPY
> > was one area touched. I am adding Andres in CC.
>
> Hm, is there any chance the COPY targets more than one partition? If so, this
> sounds like it might be the issue described here
> https://postgr.es/m/20230925213746.fwqauhhifjgefyzk%40alap3.anarazel.de

The first error was from:
log_time | 2023-10-05 09:57:01.939-05
left | COPY postgres_log FROM '/var/log/postgresql/postgresql-2023-10-05_095200.csv' WITH csv

Unfortunately, I no longer have the CSV files which caused errors.
After I moved the broken table out of the way and created a new
partition, they would've been imported successfully, and then removed.

Also, it's sad, but the original 2023_10_05_0900 partition I created was
itself rotated out of existence a few hours ago (I still have the most
interesting lines, though).

I've seen that it's possible for a CSV to include some data that ideally
would've gone into the "next" CSV: 2023-01-01_180000.csv might include a line
of data after 6pm. For example, with log_rotation_age=2min,
postgresql-2023-10-06_120800.csv had a row after 12:10:
2023-10-06 12:10:00.101 CDT,"pryzbyj","pryzbyj",5581,"[local]",65203f66.15cd,2,...

But I'm not sure how that can explain this issue, because this was
095600.csv, and not 095800.csv. My script knows to create the "next"
partition, to handle the case that the file includes some data that
should've gone to the next logfile. I'm handling that case with the
anticipation that there might be a few tenths of a second or even a few
seconds of logs in the wrong file - typically 0 lines and sometimes 1
line. I don't know if it's even possible to have multiple lines in the
"wrong" file. In any case, I'm not not expecting log rotation to be 2
minutes behind.

Also, not only was the data in the CSV earlier than 10am, but the error
*itself* was also earlier. The error importing the CSV was at 9:57, so
the CSV couldn't have had data after 10:00. Not that it matters, but my
script doesn't import the most recent logfile, and also avoids importing
files written within the last minute.

I don't see how a CSV with a 2 minute interval of data beginning at 9:56
could straddle hourly partitions.

log_time | 2023-10-05 09:57:01.939-05
left | invalid page in block 119 of relation base/16409/801594131
left | COPY postgres_log FROM '/var/log/postgresql/postgresql-2023-10-05_095200.csv' WITH csv

log_time | 2023-10-05 09:57:01.939-05
left | page verification failed, calculated checksum 5074 but expected 50
left | COPY postgres_log FROM '/var/log/postgresql/postgresql-2023-10-05_095200.csv' WITH csv

--
Justin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message john.morris 2023-10-06 21:44:05 RE: Where can I find the doxyfile?
Previous Message postgres 2023-10-06 21:29:03 RE: Where can I find the doxyfile?