Re: fdatasync performance problem with large number of DB files

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: Paul Guo <guopa(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Brown <michael(dot)brown(at)discourse(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: fdatasync performance problem with large number of DB files
Date: 2021-03-17 03:45:05
Message-ID: CA+hUKGJTERHhpAja3KvwF=0qPTgnFB=QPrR=sHmqWKJ2o2mr2Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 16, 2021 at 9:29 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
> On 2021/03/16 8:15, Thomas Munro wrote:
> > I don't want to add a hypothetical sync_after_crash=none, because it
> > seems like generally a bad idea. We already have a
> > running-with-scissors mode you could use for that: fsync=off.
>
> I heard that some backup tools sync the database directory when restoring it.
> I guess that those who use such tools might want the option to disable such
> startup sync (i.e., sync_after_crash=none) because it's not necessary.

Hopefully syncfs() will return quickly in that case, without doing any work?

> They can skip that sync by fsync=off. But if they just want to skip only that
> startup sync and make subsequent recovery (or standby server) work with
> fsync=on, they would need to shutdown the server after that startup sync
> finishes, enable fsync, and restart the server. In this case, since the server
> is restarted with the state=DB_SHUTDOWNED_IN_RECOVERY, the startup sync
> would not be performed. This procedure is tricky. So IMO supporting
> sync_after_crash=none would be helpful for this case and simple.

I still do not like this footgun :-) However, perhaps I am being
overly dogmatic. Consider the change in d8179b00, which decided that
I/O errors in this phase should be reported at LOG level rather than
ERROR. In contrast, my "sync_after_crash=wal" mode (which I need to
rebase over this) will PANIC in this case, because any syncing will be
handled through the usual checkpoint codepaths.

Do you think it would be OK to commit this feature with just "fsync"
and "syncfs", and then to continue to consider adding "none" as a
possible separate commit?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-03-17 03:50:17 Re: Getting better results from valgrind leak tracking
Previous Message Tom Lane 2021-03-17 03:23:42 Re: Getting better results from valgrind leak tracking