Re: Missing pg_control crashes postmaster

From: Andres Freund <andres(at)anarazel(dot)de>
To: David Steele <david(at)pgmasters(dot)net>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Brian Faherty <anothergenericuser(at)gmail(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Missing pg_control crashes postmaster
Date: 2018-07-25 15:09:52
Message-ID: 20180725150952.qsviniepf3m4gqzg@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2018-07-25 10:52:08 -0400, David Steele wrote:
> On 7/25/18 10:37 AM, Andres Freund wrote:
> > On July 25, 2018 7:18:30 AM PDT, David Steele <david(at)pgmasters(dot)net> wrote:
> > >
> > > It seems like an easy win if we can find a safe way to do it, though I
> > > admit that this is only a benefit in corner cases.
> >
> > What would we win here? Which scenario that's not contrived would be less bad due to the proposed change. This seems complexity for it's own sake.
>
> I think it's worth preserving pg_control even in the case where there is
> other damage to the cluster. The alternative in this case (if no backup
> exists) is to run pg_resetwal which means data since the last checkpoint
> will not be written out causing even more data loss. I have run clusters
> with checkpoint_timeout = 60m so data loss in this case is a real concern.

Wait, what? How is "data loss in this case is a real concern." - no
even a remotely realistic scenario has been described where this matters
so far.

> I favor the contrived scenario that helps preserve the current cluster
> instead of a hypothetical newly init'd one. I also don't think that users
> deleting files out of a cluster is all that contrived.

But trying to limp on in that case, and that being helpful, is.

> Adding O_CREATE to open() doesn't seem too complex to me. I'm not really in
> favor of the renaming idea, but I'm not against it either if it gets me a
> copy of the pg_control file.

The problem is that that'll just hide the issue for a bit longer, while
continuing (due to the O_CREAT we'll not PANIC anymore). Which can lead
to a lot of followup issues, like checkpoints removing old WAL that'd
have been useful for data recovery.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Gierth 2018-07-25 15:18:42 Re: Early WIP/PoC for inlining CTEs
Previous Message David Fetter 2018-07-25 14:53:59 Re: How can we submit code patches that implement our (pending) patents?