From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | David Steele <david(at)pgmasters(dot)net> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Brian Faherty <anothergenericuser(at)gmail(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Missing pg_control crashes postmaster |
Date: | 2018-07-25 15:09:52 |
Message-ID: | 20180725150952.qsviniepf3m4gqzg@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2018-07-25 10:52:08 -0400, David Steele wrote:
> On 7/25/18 10:37 AM, Andres Freund wrote:
> > On July 25, 2018 7:18:30 AM PDT, David Steele <david(at)pgmasters(dot)net> wrote:
> > >
> > > It seems like an easy win if we can find a safe way to do it, though I
> > > admit that this is only a benefit in corner cases.
> >
> > What would we win here? Which scenario that's not contrived would be less bad due to the proposed change. This seems complexity for it's own sake.
>
> I think it's worth preserving pg_control even in the case where there is
> other damage to the cluster. The alternative in this case (if no backup
> exists) is to run pg_resetwal which means data since the last checkpoint
> will not be written out causing even more data loss. I have run clusters
> with checkpoint_timeout = 60m so data loss in this case is a real concern.
Wait, what? How is "data loss in this case is a real concern." - no
even a remotely realistic scenario has been described where this matters
so far.
> I favor the contrived scenario that helps preserve the current cluster
> instead of a hypothetical newly init'd one. I also don't think that users
> deleting files out of a cluster is all that contrived.
But trying to limp on in that case, and that being helpful, is.
> Adding O_CREATE to open() doesn't seem too complex to me. I'm not really in
> favor of the renaming idea, but I'm not against it either if it gets me a
> copy of the pg_control file.
The problem is that that'll just hide the issue for a bit longer, while
continuing (due to the O_CREAT we'll not PANIC anymore). Which can lead
to a lot of followup issues, like checkpoints removing old WAL that'd
have been useful for data recovery.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Gierth | 2018-07-25 15:18:42 | Re: Early WIP/PoC for inlining CTEs |
Previous Message | David Fetter | 2018-07-25 14:53:59 | Re: How can we submit code patches that implement our (pending) patents? |