Re: Postgres, fsync, and OSs (specifically linux)

From: Andres Freund <andres(at)anarazel(dot)de>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Postgres, fsync, and OSs (specifically linux)
Date: 2018-04-27 23:43:32
Message-ID: 20180427234332.wtkjvtkbpdl6mu6g@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2018-04-27 19:38:30 -0400, Bruce Momjian wrote:
> On Fri, Apr 27, 2018 at 04:10:43PM -0700, Andres Freund wrote:
> > Hi,
> >
> > On 2018-04-27 19:04:47 -0400, Bruce Momjian wrote:
> > > On Fri, Apr 27, 2018 at 03:28:42PM -0700, Andres Freund wrote:
> > > > - We need more aggressive error checking on close(), for ENOSPC and
> > > > EIO. In both cases afaics we'll have to trigger a crash recovery
> > > > cycle. It's entirely possible to end up in a loop on NFS etc, but I
> > > > don't think there's a way around that.
> > >
> > > If the no-space or write failures are persistent, as you mentioned
> > > above, what is the point of going into crash recovery --- why not just
> > > shut down?
> >
> > Well, I mentioned that as an alternative in my email. But for one we
> > don't really have cases where we do that right now, for another we can't
> > really differentiate between a transient and non-transient state. It's
> > entirely possible that the admin on the system that ran out of space
> > fixes things, clearing up the problem.
>
> True, but if we get a no-space error, odds are it will not be fixed at
> the time we are failing. Wouldn't the administrator check that the
> server is still running after they free the space?

I'd assume it's pretty common that those are separate teams. Given that
we currently don't behave that way for other cases where we *already*
can enter crash-recovery loops I don't think we need to introduce that
here. It's far more common to enter this kind of problem with pg_xlog
filling up the ordinary way. And that can lead to such loops.

> > > Also, since we can't guarantee that we can write any persistent state
> > > to storage, we have no way of preventing infinite crash recovery
> > > loops, which, based on inconsistent writes, might make things worse.
> >
> > How would it make things worse?
>
> Uh, I can imagine some writes working and some not, and getting things
> more inconsistent. I would say at least that we don't know.

Recovery needs to fix that or we're lost anyway. And we'll retry exactly
the same writes each round.

> > > An additional features we have talked about is running some kind of
> > > notification shell script to inform administrators, similar to
> > > archive_command. We need this too when sync replication fails.
> >
> > To me that seems like a feature independent of this thread.
>
> Well, if we are introducing new panic-and-not-restart behavior, we might
> need this new feature.

I don't see how this follows. It's easier to externally script
notification for the server having died, than doing it for crash
restarts. That's why we have restart_after_crash=false... There might
be some arguments for this type of notification, but I don't think it
should be conflated with the problem here. Nor is it guaranteed that
such a script could do much, given that disks might be failing and such.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-04-28 02:06:52 Re: "could not reattach to shared memory" on buildfarm member dory
Previous Message Bruce Momjian 2018-04-27 23:38:30 Re: Postgres, fsync, and OSs (specifically linux)