Re: Reduce/eliminate the impact of FPW

From: SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>
To: Daniel Wood <hexexpert(at)comcast(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reduce/eliminate the impact of FPW
Date: 2020-08-03 20:07:32
Message-ID: CAHg+QDcEkAYYhVXYWjvnTHxNswc2wZ_YYhPpai_+UWy1CNOBog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Increasing checkpoint_timeout helps reduce the amount of log written to the
disk. This has several benefits like, reduced number of WAL IO, archival
load on the system, less network traffic to the standby replicas. However,
this increases the crash recovery time and impact server availability.
Investing in parallel recovery for Postgres helps reduce the crash recovery
time and allows us to change the checkpoint frequency to much higher value?
This idea is orthogonal to the double write improvements mentioned in the
thread. Thomas Munro has a patch of doing page prefetching during recovery
which speeds up recovery if the working set doesn't fit in the memory, we
also need parallel recovery to replay huge amounts of WAL, when the working
set is in memory.

Thanks,
Satya

On Mon, Aug 3, 2020 at 11:14 AM Daniel Wood <hexexpert(at)comcast(dot)net> wrote:

>
> > On 08/03/2020 8:26 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> ...
> > I think this is what's called a double-write buffer, or what was tried
> > some years ago under that name. A significant problem is that you
> > have to fsync() the double-write buffer before you can write the WAL.
>
> I don't think it does need to be fsync'ed before the WAL. If the
> log record has a FPW reference beyond the physical log EOF then we
> don't need to restore the before image because we haven't yet did
> the dirty page write from the cache. The before image only needs
> to be flushed before the dirty page write. Usually this will have
> already done.
>
> > ... But for short transactions, such as those
> > performed by pgbench, you'd probably end up with a lot of cases where
> > you had to write 3 pages instead of 2, and not only that, but the
> > writes have to be consecutive rather than simultaneous, and to
> > different parts of the disk rather than sequential. That would likely
> > suck a lot.
>
> Wherever you write the before images, in the WAL or into a separate
> file you would write the same number of pages. I don't understand
> the 3 pages vs 2 pages comment.
>
> And, "different parts of the disk"??? I wouldn't enable the feature
> on spinning media unless I had a dedicated disk for it.
>
> NOTE:
> If the 90's Informix called this the physical log. Restoring at
> crash time restored physical consistency after which redo/undo
> recovery achieved logical consistency. From their doc's:
> "If the before-image of a modified page is stored in the physical-log
> buffer, it is eventually flushed from the physical-log buffer to the
> physical log on disk. The before-image of the page plays a critical role in
> restoring data and fast recovery. For more details, see Physical-Log
> Buffer."
>
> > --
> > Robert Haas
> > EnterpriseDB: http://www.enterprisedb.com
> > The Enterprise PostgreSQL Company
>
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2020-08-03 21:02:14 Re: PG 13 release notes, first draft
Previous Message Fabien COELHO 2020-08-03 19:34:47 Re: psql - improve test coverage from 41% to 88%