Re: Multiple full page writes in a single checkpoint?

From: Andres Freund <andres(at)anarazel(dot)de>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Multiple full page writes in a single checkpoint?
Date: 2021-02-03 23:29:13
Message-ID: 20210203232913.t3fng3evt4qucm3g@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-02-03 18:05:56 -0500, Bruce Momjian wrote:
> log_hint_bits already gives us a unique nonce for the first hint bit
> change on a page during a checkpoint, but we only encrypt on page write
> to the file system, so I am researching if log_hint_bits will already
> generate a unique LSN for every page write to the file system, even if
> there are multiple hint-bit-caused page writes to the file system during
> a single checkpoint. (We already know this works for multiple
> checkpoints.)

No, it won't:

> However, imagine these steps:
>
> 1. checkpoint starts
> 2. page is modified by row or hint bit change
> 3. page gets a new LSN and is marked as dirty
> 4. page image is flushed to WAL
> 5. pages is written to disk and marked as clean
> 6. page is modified by data or hint bit change
> 7. pages gets a new LSN and is marked as dirty
> 8. page image is flushed to WAL
> 9. checkpoint completes
> 10. pages is written to disk and marked as clean
>
> Is the above case valid, and would it cause two full page writes to WAL?
> More specifically, wouldn't it cause every write of the page to the file
> system to use a new LSN?

No. 8) won't happen. Look e.g. at XLogSaveBufferForHint():

/*
* Update RedoRecPtr so that we can make the right decision
*/
RedoRecPtr = GetRedoRecPtr();

/*
* We assume page LSN is first data on *every* page that can be passed to
* XLogInsert, whether it has the standard page layout or not. Since we're
* only holding a share-lock on the page, we must take the buffer header
* lock when we look at the LSN.
*/
lsn = BufferGetLSNAtomic(buffer);

if (lsn <= RedoRecPtr)
/* wal log hint bit */

The RedoRecPtr is determined at 1. and doesn't change between 4) and
8). The LSN for 4) has to be *past* the RedoRecPtr from 1). Therefore we
don't do another FPW.

Changing this is *completely* infeasible. In a lot of workloads it'd
cause a *massive* explosion of WAL volume. Like quadratically. You'll
need to find another way to generate a nonce.

In the non-hint bit case you'll automatically have a higher LSN in 7/8
though. So you won't need to do anything about getting a higher nonce.

For the hint bit case in 8 you could consider just using any LSN generated
after 4 (preferrably already flushed to disk) - but that seems somewhat
ugly from a debuggability POV :/. Alternatively you could just create
tiny WAL record to get a new LSN, but that'll sometimes trigger new WAL
flushes when the pages are dirtied.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-02-03 23:46:50 Re: logical replication worker accesses catalogs in error context callback
Previous Message Bruce Momjian 2021-02-03 23:05:56 Multiple full page writes in a single checkpoint?