Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"

From: "Anton A(dot) Melnikov" <aamelnikov(at)inbox(dot)ru>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"
Date: 2023-02-01 04:04:09
Message-ID: 2a64f915-213b-4bd0-1cdf-bd987694021e@inbox.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, Thomas!

There are two variants of the patch now.

1) As for the first workaround:

On 31.01.2023 07:09, Thomas Munro wrote:
>
> Maybe it's unlikely that two samples will match while running that
> torture test, because it's overwriting the file as fast as it can.
> But the idea is that a real system isn't writing the control file most
> of the time.
>
........
> Yeah, I was thinking that we should also put a limit on the loop, just
> to be cautious.

At first i didn’t understand that the equality condition with the previous
calculated crc and the current one at the second+ attempts was intended
for the case when the pg_control file is really corrupted.

Indeed, by making a few debugging variables and running the tortue test,
i found that there were ~4000 crc errors (~0.0003%) in ~125 million reads from the file,
and there was no case when the crc error appeared twice in a row.
So the second and moreover the third successive occurrence of an crc error
can be neglected and for this workaround seems a simpler and maybe more clear
algorithm is possible.
For instance:

for(try = 0 ; try < 3; try++)
{
open, read from and close pg_control;
calculate crc;

*crc_ok_p = EQ_CRC32C(crc, ControlFile->crc);

if(*crc_ok_p)
break;
}

2) As for the second variant of the patch with POSIX locks:

On 31.01.2023 14:38, Thomas Munro wrote:
> On Tue, Jan 31, 2023 at 5:09 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>> Clearly there is an element of speculation or superstition here. I
>> don't know what else to do if both PostgreSQL and ext4 decided not to
>> add interlocking. Maybe we should rethink that. How bad would it
>> really be if control file access used POSIX file locking? I mean, the
>> writer is going to *fsync* the file, so it's not like one more wafer
>> thin system call is going to hurt too much.
>
> Here's an experimental patch for that alternative. I wonder if
> someone would want to be able to turn it off for some reason -- maybe
> some NFS problem? It's less back-patchable, but maybe more
> principled?

It looks very strange to me that there may be cases where the cluster data
is stored in NFS. Can't figure out how this can be useful.

i think this variant of the patch is a normal solution
of the problem, not workaround. Found no problems on Linux.
+1 for this variant.

Might add a custom error message for EDEADLK
since it absent in errcode_for_file_access()?

> I don't know if Windows suffers from this type of problem.
> Unfortunately its equivalent functionality LockFile() looks non-ideal
> for this purpose: if your program crashes, the documentation is very
> vague on when exactly it is released by the OS, but it's not
> immediately on process exit. That seems non-ideal for a control file
> you might want to access again very soon after a crash, to be able to
> recover.

Unfortunately i've not had time to reproduce the problem and apply this patch on
Windows yet but i'm going to do it soon on windows 10. If a crc error
will occur there, then we might use the workaround from the first
variant of the patch.

> A thought in passing: if UpdateMinRecoveryPoint() performance is an
> issue, maybe we should figure out how to use fdatasync() instead of
> fsync().

May be choose it in accordance with GUC wal_sync_method?

Sincerely yours,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2023-02-01 04:29:49 Re: Progress report of CREATE INDEX for nested partitioned tables
Previous Message David Rowley 2023-02-01 03:52:07 Re: Generating "Subplan Removed" in EXPLAIN