Re: Postgres, fsync, and OSs (specifically linux)

From: Michael Banck <michael(dot)banck(at)credativ(dot)de>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Postgres, fsync, and OSs (specifically linux)
Date: 2018-04-28 15:35:48
Message-ID: 20180428153548.GA24854@nighthawk.caipicrew.dd-dns.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Sat, Apr 28, 2018 at 11:21:20AM -0400, Stephen Frost wrote:
> * Craig Ringer (craig(at)2ndquadrant(dot)com) wrote:
> > On 28 April 2018 at 06:28, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > - Add a pre-checkpoint hook that checks for filesystem errors *after*
> > > fsyncing all the files, but *before* logging the checkpoint completion
> > > record. Operating systems, filesystems, etc. all log the error format
> > > differently, but for larger installations it'd not be too hard to
> > > write code that checks their specific configuration.
> >
> > I looked into using trace event file descriptors for this, btw, but
> > we'd need CAP_SYS_ADMIN to create one that captured events for other
> > processes. Plus filtering the events to find only events for the files
> > / file systems of interest would be far from trivial. And I don't know
> > what guarantees we have about when events are delivered.
> >
> > I'd love to be able to use inotify for this, but again, that'd only be
> > a new-kernels thing since it'd need an inotify extension to report I/O
> > errors.
> >
> > Presumably mostly this check would land up looking at dmesg.
> >
> > I'm not convinced it'd get widely deployed and widely used, or that
> > it'd be used correctly when people tried to use it. Look at the
> > hideous mess that most backup/standby creation scripts,
> > archive_command scripts, etc are.
>
> Agree with more-or-less everything you've said here, but a big +1 on
> this. If we do end up going down this route we have *got* to provide
> scripts which we know work and have been tested and are well maintained
> on the popular OS's for the popular filesystems and make it clear that
> we've tested those and not others. We definitely shouldn't put
> something in our docs that is effectively an example of the interface
> but not an actual command that anyone should be using.

This dmesg-checking has been mentioned several times now, but IME
enterprise distributions (or server ops teams?) seem to tighten access
to dmesg and /var/log to non-root users, including postgres.

Well, or just vanilla Debian stable apparently:

postgres(at)fock:~$ dmesg
dmesg: read kernel buffer failed: Operation not permitted

Is it really a useful expectation that the postgres user will be able to
trawl system logs for I/O errors? Or are we expecting the sysadmins (in
case they are distinct from the DBAs) to setup sudo and/or relax
permissions for this everywhere? We should document this requirement
properly at least then.

The netlink thing from Google that Tet Ts'O mentioned would probably
work around that, but if that is opened up it would not be deployed
anytime soon either.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael(dot)banck(at)credativ(dot)de
credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-04-28 15:46:53 Re: Fix some trivial issues of the document/comment
Previous Message Peter Eisentraut 2018-04-28 15:34:36 Re: Verbosity of genbki.pl