fdatasync performance problem with large number of DB files

From: Michael Brown <michael(dot)brown(at)discourse(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: fdatasync performance problem with large number of DB files
Date: 2021-03-10 20:21:54
Message-ID: 11bc2bb7-ecb5-3ad0-b39f-df632734cd81@discourse.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I initially posted this on the pgsql-general mailing list [5] but was
advised to also post this to the -hackers list as it deals with internals.

We've encountered a production performance problem with pg13 related to
how it fsyncs the whole data directory in certain scenarios, related to
what Paul (bcc'ed) described in a post to pgsql-hackers [1].


We've observed the full recursive fsync is triggered when

* pg_basebackup receives a streaming backup (via [2] fsync_dir_recurse
or fsync_pgdata) unless --no-sync is specified
* postgres starts up unclean (via [3] SyncDataDirectory)

We run multiple postgres clusters and some of those clusters have many
(~450) databases (one database-per-customer) meaning that the postgres
data directory has around 700,000 files.

On one of our less loaded servers this takes ~7 minutes to complete, but
on another [4] this takes ~90 minutes.

Obviously this is untenable risk. We've modified our process that
bootstraps a replica via pg_basebackup to instead do "pg_basebackup
--no-sync…" followed by a "sync", but we don't have any way to do the
equivalent for the postgres startup.

I presume the reason postgres doesn't blindly run a sync() is that we
don't know what other I/O is on the system and it'd be rude to affect
other services. That makes sense, except for our environment the work
done by the recursive fsync is orders of magnitude more disruptive than
a sync().

My questions are:

* is there a knob missing we can configure?
* can we get an opt-in knob to use a single sync() call instead of a
recursive fsync()?
* would you be open to merging a patch providing said knob?
* is there something else we missed?


[4]: It should be identical config-wise. It isn't starved for IO but
does have other regular write workloads

Michael Brown
Civilized Discourse Construction Kit, Inc.


Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-03-10 20:25:02 Re: libpq debug log
Previous Message Andres Freund 2021-03-10 20:17:31 Re: Occasional tablespace.sql failures in check-world -jnn