Re: fdatasync performance problem with large number of DB files

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Paul Guo <guopa(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Brown <michael(dot)brown(at)discourse(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: fdatasync performance problem with large number of DB files
Date: 2021-03-18 06:46:11
Message-ID: 52453bfc-40fb-7771-05ee-cbd04b6b9b6e@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021/03/17 12:45, Thomas Munro wrote:
> On Tue, Mar 16, 2021 at 9:29 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>> On 2021/03/16 8:15, Thomas Munro wrote:
>>> I don't want to add a hypothetical sync_after_crash=none, because it
>>> seems like generally a bad idea. We already have a
>>> running-with-scissors mode you could use for that: fsync=off.
>>
>> I heard that some backup tools sync the database directory when restoring it.
>> I guess that those who use such tools might want the option to disable such
>> startup sync (i.e., sync_after_crash=none) because it's not necessary.
>
> Hopefully syncfs() will return quickly in that case, without doing any work?

Yes, in Linux.

>
>> They can skip that sync by fsync=off. But if they just want to skip only that
>> startup sync and make subsequent recovery (or standby server) work with
>> fsync=on, they would need to shutdown the server after that startup sync
>> finishes, enable fsync, and restart the server. In this case, since the server
>> is restarted with the state=DB_SHUTDOWNED_IN_RECOVERY, the startup sync
>> would not be performed. This procedure is tricky. So IMO supporting
>> sync_after_crash=none would be helpful for this case and simple.
>
> I still do not like this footgun :-) However, perhaps I am being
> overly dogmatic. Consider the change in d8179b00, which decided that
> I/O errors in this phase should be reported at LOG level rather than
> ERROR. In contrast, my "sync_after_crash=wal" mode (which I need to
> rebase over this) will PANIC in this case, because any syncing will be
> handled through the usual checkpoint codepaths.
>
> Do you think it would be OK to commit this feature with just "fsync"
> and "syncfs", and then to continue to consider adding "none" as a
> possible separate commit?

+1. "syncfs" feature is useful whether we also support "none" mode or not.
It's good idea to commit "syncfs" in advance.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-03-18 06:58:46 psql tab completion for \h with IMPORT FOREIGN SCHEMA
Previous Message Michael Paquier 2021-03-18 06:41:38 Re: a verbose option for autovacuum