Re: fdatasync performance problem with large number of DB files

From: Paul Guo <paulguo(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Paul Guo <guopa(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Brown <michael(dot)brown(at)discourse(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: fdatasync performance problem with large number of DB files
Date: 2021-03-17 10:42:46
Message-ID: CABQrizeo4K2UEz4V_X0kkvwwkgew9XpxC6-9jPZVA3xEPV=aGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 17, 2021 at 11:45 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>
> On Tue, Mar 16, 2021 at 9:29 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
> > On 2021/03/16 8:15, Thomas Munro wrote:
> > > I don't want to add a hypothetical sync_after_crash=none, because it
> > > seems like generally a bad idea. We already have a
> > > running-with-scissors mode you could use for that: fsync=off.
> >
> > I heard that some backup tools sync the database directory when restoring it.
> > I guess that those who use such tools might want the option to disable such
> > startup sync (i.e., sync_after_crash=none) because it's not necessary.
>
> Hopefully syncfs() will return quickly in that case, without doing any work?

I just quickly reviewed the patch (the code part). It looks good. Only
one concern
or question is do_syncfs() for symlink opened fd for syncfs() - I'm
not 100% sure.

I think we could consider reviewing and then pushing the syncfs patch
at this moment since
the fsync issue really affects much although there seems to be a
better plan for this in the future,
it may make the sync step in startup much faster. Today I first
encountered a real
case in a production environment. startup spends >1hour on the fsync
step: I'm pretty
sure that the pgdata is sync-ed, and per startup pstack I saw the
startup process
one by one slowly open(), fsync() (surely do nothing) and close(), and
the pre_sync_fname() is also a burden in such a scenario. So this
issue is really
annoying.

We could discuss further optimizing the special crash recovery
scenario that users
explicitly know the sync step could be skipped (this scenario is
surely not unusual),
even having the patch. The syncfs patch could be used for this
scenario also but the
filesystem might be shared by other applications (this is not unusual
and happens in my
customer's environment for example) so syncfs() is (probably much) slower than
skipping the sync step.

--
Paul Guo (Vmware)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-03-17 11:22:34 Re: logical replication worker accesses catalogs in error context callback
Previous Message houzj.fnst@fujitsu.com 2021-03-17 09:45:33 RE: Parallel Inserts in CREATE TABLE AS