Re: Postgres, fsync, and OSs (specifically linux)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Postgres, fsync, and OSs (specifically linux)
Date: 2018-07-18 19:23:08
Message-ID: CA+TgmoYoVB8LSJ0mufU_N0G3YHw1ZSJVCMXk06GKZaDuboGHjw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 29, 2018 at 4:53 AM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> I've revised the fsync patch with the cleanups discussed and gone through
> the close() calls.
>
> AFAICS either socket closes, temp file closes, or (for WAL) already PANIC on
> close. It's mainly fd.c that needs amendment. Which I've done per the
> attached revised patch.

I think we should have a separate thread for this patch vs. Andres's
patch to do magic things with the checkpointer and file-descriptor
forwarding. Meanwhile, here's some review.

1. No longer applies cleanly.

2. I don't like promote_ioerr_to_panic() very much, partly because the
same pattern gets repeated over and over, and partly because it would
be awkwardly-named if we discovered that another 2 or 3 errors needed
similar handling (or some other variant handling). I suggest instead
having a function like report_critical_fsync_failure(char *path) that
does something like this:

int elevel = ERROR;
if (errno == EIO)
elevel = PANIC;
ereport(elevel,
(errcode_for_file_access(),
errmsg("could not fsync file \"%s\": %m", path);

And similarly I'd add report_critical_close_failure. In some cases,
this would remove wording variations (e.g. in twophase.c) but I think
that's fine, and maybe an improvement, as discussed on another recent
thread.

3. slru.c calls pg_fsync() but isn't changed by the patch. That looks wrong.

4. The comment changes in snapbuild.c interact with the TODO that
immediately follows. I think more adjustment is needed here.

5. It seems odd that you adjusted the comment for
pg_fsync_no_writethrough() but not pg_fsync_writethrough() or
pg_fsync(). Either pg_fsync_writethrough() doesn't have the same
problem, in which case, awesome, but let's add a comment, or it does,
in which case it should refer to the other one. And I think
pg_fsync() itself needs a comment saying that every caller must be
careful to use promote_ioerr_to_panic() or
report_critical_fsync_failure() or whatever we end up calling it
unless the fsync is not critical for data integrity.

6. In md.c, there's a stray blank line added. But more importantly,
the code just above that looks like this:

if (!FILE_POSSIBLY_DELETED(errno) ||
failures > 0)
- ereport(ERROR,
+ ereport(promote_ioerr_to_panic(ERROR),
(errcode_for_file_access(),
errmsg("could not fsync file \"%s\": %m",
path)));
else
ereport(DEBUG1,
(errcode_for_file_access(),
errmsg("could not fsync file \"%s\"
but retrying: %m",
path)));

I might be all wet here, but it seems like if we enter the bottom
branch, we still need the promote-to-panic behavior.

7. The comment adjustment for SyncDataDirectory mentions an
"important" fact about fsync behavior, but then doesn't seem to change
any logic on that basis. I think in general a number of these
comments need a little more thought, but in this particular case, I
think we also need to consider what the behavior should be (and the
comment should reflect our considered judgement on that point, and the
implementation should match).

8. Andres suggested to me off-list that we should have a GUC to
disable the promote-to-panic behavior in case it turns out to be a
show-stopper for some user. I think that's probably a good idea.
Adding many new ways to PANIC in a minor release without providing any
way to go back to the old behavior sounds unfriendly. Surely, anyone
who suffers much from this has really serious other problems anyway,
but all the same I think we should provide an escape hatch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-07-18 19:43:58 Re: patch to allow disable of WAL recycling
Previous Message Jerry Jelinek 2018-07-18 19:22:59 Re: patch to allow disable of WAL recycling