Re: Two fsync related performance issues?

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Paul Guo <pguo(at)pivotal(dot)io>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Two fsync related performance issues?
Date: 2020-10-05 01:38:32
Message-ID: CA+hUKGKT6XiPiEJrqeOFGi7RYCGzbBysF9pyWwv0-jm-oNajxg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 9, 2020 at 3:49 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> On Thu, Sep 3, 2020 at 11:30 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> > On Wed, May 27, 2020 at 12:31 AM Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> > > On Tue, 12 May 2020, 08:42 Paul Guo, <pguo(at)pivotal(dot)io> wrote:
> > >> 1. StartupXLOG() does fsync on the whole data directory early in the crash recovery. I'm wondering if we could skip some directories (at least the pg_log/, table directories) since wal, etc could ensure consistency. Here is the related code.
> > >>
> > >> if (ControlFile->state != DB_SHUTDOWNED &&
> > >> ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
> > >> {
> > >> RemoveTempXlogFiles();
> > >> SyncDataDirectory();
> > >> }
>
> > 4. In datadir_fsync_fname(), if ParseRelationPath() is able to
> > recognise a file as being a relation file, build a FileTag and call
> > RegisterSyncRequest() to tell the checkpointer to sync this file as
> > part of the end checkpoint (currently the end-of-recovery checkpoint,
> > but that could also be relaxed).
>
> For the record, Andres Freund mentioned a few problems with this
> off-list and suggested we consider calling Linux syncfs() for each top
> level directory that could potentially be on a different filesystem.
> That seems like a nice idea to look into.

Here's an experimental patch to try that. One problem is that before
Linux 5.8, syncfs() doesn't report failures[1]. I'm not sure what to
think about that; in the current coding we just log them and carry on
anyway, but any theoretical problems that causes for BLK_DONE should
be moot anyway because of FPIs which result in more writes and syncs.
Another is that it may affect other files that aren't under pgdata as
collateral damage, but that seems acceptable. It also makes me a bit
sad that this wouldn't help other OSes.

(Archeological note: The improved syncfs() error reporting is linked
to 2018 PostgreSQL/Linux hacker discussions[2], because it was thought
that syncfs() might be useful for checkpointing, though I believe
since then things have moved on and the new thinking is that we'd use
a new proposed interface to read per-filesystem I/O error counters
while checkpointing.)

[1] https://man7.org/linux/man-pages/man2/sync.2.html
[2] https://lwn.net/Articles/752063/

Attachment Content-Type Size
0001-Use-syncfs-for-SyncDataDirectory-on-Linux.patch text/x-patch 5.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message James Coleman 2020-10-05 01:40:33 Re: enable_incremental_sort changes query behavior
Previous Message k.jamison@fujitsu.com 2020-10-05 01:29:07 RE: [Patch] Optimize dropping of relation buffers using dlist