Re: fdatasync performance problem with large number of DB files

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: Michael Brown <michael(dot)brown(at)discourse(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: fdatasync performance problem with large number of DB files
Date: 2021-03-11 01:20:38
Message-ID: CA+hUKG+QXUbkO2wGW4X5jrqRyD2J6Pnv_qtNzFFs=u0qU038cA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 11, 2021 at 2:00 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
> On 2021/03/11 8:30, Thomas Munro wrote:
> > I've run into a couple of users who have just commented that recursive
> > fsync() code out!
>
> BTW, we can skip that recursive fsync() by disabling fsync GUC even without
> commenting out the code?

Those users wanted fsync=on because they wanted to recover to a normal
online system after a crash, but they believed that the preceding
fsync of the data directory was useless, because replaying the WAL
should be enough. IMHO they were nearly on the right track, and the
prototype patch I linked earlier as [2] was my attempt to find the
specific reasons why that doesn't work and fix them. So far, I
figured out that you still have to remember to fsync the WAL files
(otherwise you're replaying WAL that potentially hasn't reached the
disk), and data files holding blocks that recovery decided to skip due
to BLK_DONE (otherwise you might decide to skip replay because of a
higher LSN that is on a page that is in the kernel's cache but not yet
on disk).

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-03-11 01:25:05 Re: fdatasync performance problem with large number of DB files
Previous Message Kyotaro Horiguchi 2021-03-11 01:20:23 Re: libpq debug log