Re: fdatasync performance problem with large number of DB files

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: David Steele <david(at)pgmasters(dot)net>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Paul Guo <guopa(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Brown <michael(dot)brown(at)discourse(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Subject: Re: fdatasync performance problem with large number of DB files
Date: 2021-03-19 23:16:27
Message-ID: CA+hUKG+z+ET2-o8cX5XBqPTNzNUXzMAZKP2YU8cgz53cdC65OA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks Justin and David. Replies to two emails inline:

On Sat, Mar 20, 2021 at 12:01 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> On Fri, Mar 19, 2021 at 06:28:46PM +1300, Thomas Munro wrote:
> > +++ b/doc/src/sgml/config.sgml
>
> > + <productname>PostgreSQL</productname> will recursively open and fsync
> > + all files in the data directory before crash recovery begins. This
>
> Maybe it should say "data, tablespace, and WAL directories", or just "critical
> directories".

Fair point. Here's what I went with:

When set to <literal>fsync</literal>, which is the default,
<productname>PostgreSQL</productname> will recursively open and
synchronize all files in the data directory before crash
recovery
begins. The search for files will follow symbolic links for the WAL
directory and each configured tablespace (but not any other symbolic
links).

> > + {
> > + {"recovery_init_sync_method", PGC_POSTMASTER, ERROR_HANDLING_OPTIONS,
> > + gettext_noop("Sets the method for synchronizing the data directory before crash recovery."),
> > + },
>
> "and tablespaces and WAL"

I feel like that's getting too detailed for the GUC description?

On Sat, Mar 20, 2021 at 2:55 AM David Steele <david(at)pgmasters(dot)net> wrote:
> I had a look at the patch and it looks good to me.

Thanks!

> Should we mention in the docs that the contents of non-standard symlinks
> will not be synced, i.e. anything other than tablespaces and pg_wal? Or
> can we point them to some docs saying not to do that (if such exists)?

Good idea. See above for the adjustment I went with to describe the
traditional behaviour, and then I also made a similar change to the
syncfs part, which, I hope, manages to convey that the new mode
matches the existing policy on symlinks:

On Linux, <literal>syncfs</literal> may be used instead, to ask the
operating system to synchronize the whole file systems that contain the
data directory, the WAL file and each tablespace (but not any other
file systems that may be reachable through symbolic links).

I thought about adding some text along the lines that such symlinks
are not expected, but I think you're right that what we really need is
a good place to point to. I mean, generally you can't mess around
with the files managed by PostgreSQL and expect everything to keep
working correctly, but it wouldn't hurt to make an explicit statement
about symlinks and where they're allowed (or maybe there is one
already and I failed to find it). There are hints though, like
pg_basebackup's documentation which tells you it won't follow or
preserve them in general, but... hmm, it also contemplates various
special subdirectories (pg_dynshmem, pg_notify, pg_replslot, ...) that
might be symlinks without saying why.

> > Ok, I made the changes you suggested. Let's see if anyone else would
> > like to vote for or against the concept of the 0002 patch
> > (recovery_init_sync_method=none).
>
> It worries me that this needs to be explicitly "turned off" after the
> initial recovery. Seems like something of a foot gun.
>
> Since we have not offered this functionality before I'm not sure we
> should rush to introduce it now. For backup solutions that do their own
> syncing, syncfs() should provide excellent performance so long as the
> file system is not shared, which is something the user can control (and
> is noted in the docs).

Thanks. I'm leaving the 0002 patch "on ice" until someone can explain
how you're supposed to use it without putting a hole in your foot.

I pushed the 0001 patch. Thanks to all who reviewed. Of course,
further documentation improvement patches are always welcome.

(One silly thing I noticed is that our comments generally think
"filesystem" is one word, but our documentation always has a space;
this patch followed the local convention in both cases!)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-03-19 23:51:15 shared memory stats: high level design decisions: consistency, dropping
Previous Message Robert Haas 2021-03-19 23:14:08 Re: [HACKERS] Custom compression methods