Re: fdatasync performance problem with large number of DB files

From: David Steele <david(at)pgmasters(dot)net>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Paul Guo <guopa(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Brown <michael(dot)brown(at)discourse(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Subject: Re: fdatasync performance problem with large number of DB files
Date: 2021-03-20 00:30:54
Message-ID: 453b323b-9532-2885-32f9-8976f2605c60@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/19/21 7:16 PM, Thomas Munro wrote:
> Thanks Justin and David. Replies to two emails inline:
>
> Fair point. Here's what I went with:
>
> When set to <literal>fsync</literal>, which is the default,
> <productname>PostgreSQL</productname> will recursively open and
> synchronize all files in the data directory before crash
> recovery
> begins. The search for files will follow symbolic links for the WAL
> directory and each configured tablespace (but not any other symbolic
> links).
>

+1

> I thought about adding some text along the lines that such symlinks
> are not expected, but I think you're right that what we really need is
> a good place to point to. I mean, generally you can't mess around
> with the files managed by PostgreSQL and expect everything to keep
> working correctly

WRT to symlinks I'm not sure that's fair to say. From PG's perspective
it's just a dir/file after all. Other than pg_wal I have seen
pg_stat/pg_stat_tmp sometimes symlinked, plus config files, and the log dir.

pgBackRest takes a pretty liberal approach here. Were preserve all
dir/file symlinks no matter where they appear and allow all of them to
be remapped on restore.

> but it wouldn't hurt to make an explicit statement
> about symlinks and where they're allowed (or maybe there is one
> already and I failed to find it).

I couldn't find it either and I would be in favor of it. For instance,
pgBackRest forbids tablespaces inside PGDATA and when people complain
(more often then you might imagine) we can just point to the code/docs.

> There are hints though, like
> pg_basebackup's documentation which tells you it won't follow or
> preserve them in general, but... hmm, it also contemplates various
> special subdirectories (pg_dynshmem, pg_notify, pg_replslot, ...) that
> might be symlinks without saying why.

Right, pg_dynshmem is another one that I've seen symlinked. Some things
are nice to have on fast storage. pg_notify and pg_replslot are similar
since they get written to a lot in certain configurations.

>> It worries me that this needs to be explicitly "turned off" after the
>> initial recovery. Seems like something of a foot gun.
>>
>> Since we have not offered this functionality before I'm not sure we
>> should rush to introduce it now. For backup solutions that do their own
>> syncing, syncfs() should provide excellent performance so long as the
>> file system is not shared, which is something the user can control (and
>> is noted in the docs).
>
> Thanks. I'm leaving the 0002 patch "on ice" until someone can explain
> how you're supposed to use it without putting a hole in your foot.

+1

> (One silly thing I noticed is that our comments generally think
> "filesystem" is one word, but our documentation always has a space;
> this patch followed the local convention in both cases!)

Personally I prefer "file system".

Regards,
--
-David
david(at)pgmasters(dot)net

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2021-03-20 00:31:13 Re: shared memory stats: high level design decisions: consistency, dropping
Previous Message Craig Ringer 2021-03-20 00:29:32 Re: [PATCH] Identify LWLocks in tracepoints