From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | "Anton A(dot) Melnikov" <aamelnikov(at)inbox(dot)ru> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt" |
Date: | 2023-02-01 06:45:34 |
Message-ID: | CA+hUKGKZoCCn+6Z_dmwQsKu5PVLbDu6G7fD6JU1qok84r1ry0A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Feb 1, 2023 at 5:04 PM Anton A. Melnikov <aamelnikov(at)inbox(dot)ru> wrote:
> On 31.01.2023 14:38, Thomas Munro wrote:
> > Here's an experimental patch for that alternative. I wonder if
> > someone would want to be able to turn it off for some reason -- maybe
> > some NFS problem? It's less back-patchable, but maybe more
> > principled?
>
> It looks very strange to me that there may be cases where the cluster data
> is stored in NFS. Can't figure out how this can be useful.
Heh. There are many interesting failure modes, but people do it. I
guess my more general question when introducing any new system call
into this code is how some unusual system I can't even access is going
to break. Maybe some obscure filesystem will fail with EOPNOTSUPP, or
take 5 seconds and then fail because there is no lock server
configured or whatever, so that's why I don't think we can back-patch
it, and we probably need a way to turn it off.
> i think this variant of the patch is a normal solution
> of the problem, not workaround. Found no problems on Linux.
> +1 for this variant.
I prefer it too.
> Might add a custom error message for EDEADLK
> since it absent in errcode_for_file_access()?
Ah, good thought. I think it shouldn't happen™, so it's OK that
errcode_for_file_access() would classify it as ERRCODE_INTERNAL_ERROR.
Other interesting errors are:
ENOLCK: system limits exceeded; PANIC seems reasonable
EOPNOTSUPP: this file doesn't support locking (seen on FreeBSD man
pages, not on POSIX)
> > I don't know if Windows suffers from this type of problem.
> > Unfortunately its equivalent functionality LockFile() looks non-ideal
> > for this purpose: if your program crashes, the documentation is very
> > vague on when exactly it is released by the OS, but it's not
> > immediately on process exit. That seems non-ideal for a control file
> > you might want to access again very soon after a crash, to be able to
> > recover.
>
> Unfortunately i've not had time to reproduce the problem and apply this patch on
> Windows yet but i'm going to do it soon on windows 10. If a crc error
> will occur there, then we might use the workaround from the first
> variant of the patch.
Thank you for investigating. I am afraid to read your results.
One idea would be to proceed with LockFile() for Windows, with a note
suggesting you file a bug with your OS vendor if you ever need it to
get unstuck. Googling this subject, I read that MongoDB used to
suffer from stuck lock files, until an OS bug report led to recent
versions releasing locks more promptly. I find that sufficiently
scary that I would want to default the feature to off on Windows, even
if your testing shows that it does really need it.
> > A thought in passing: if UpdateMinRecoveryPoint() performance is an
> > issue, maybe we should figure out how to use fdatasync() instead of
> > fsync().
>
> May be choose it in accordance with GUC wal_sync_method?
Here's a patch like that. I don't know if it's a good idea for
wal_sync_method to affect other kinds of files or not, but, then, it
already does (fsync_writethough changes this behaviour). I would
only want to consider this if we also stop choosing "open_datasync" as
a default on at least Windows.
Attachment | Content-Type | Size |
---|---|---|
0001-Apply-wal_sync_method-to-pg_control-file.patch | text/x-patch | 5.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | houzj.fnst@fujitsu.com | 2023-02-01 06:49:24 | RE: pub/sub - specifying optional parameters without values. |
Previous Message | Julien Rouhaud | 2023-02-01 06:37:27 | Re: Allow an extention to be updated without a script |