Re: Online checksums patch - once again

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: Michael Banck <michael(dot)banck(at)credativ(dot)de>, Magnus Hagander <magnus(at)hagander(dot)net>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Online checksums patch - once again
Date: 2021-02-09 08:54:50
Message-ID: ef9bb1da-ba99-a59f-4102-56026813469f@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(I may have said this before, but) My overall high-level impression of
this patch is that it's really cmmplex for a feature that you use maybe
once in the lifetime of a cluster. I'm happy to review but I'm not
planning to commit this myself. I don't object if some other committer
picks this up (Magnus?).

Now to the latest patch version:

On 03/02/2021 18:15, Daniel Gustafsson wrote:
> The previous v35 had a tiny conflict in pg_class.h, the attached v36 (which is
> a squash of the 2 commits in v35) fixes that. No other changes are introduced
> in this version.

> /*
> * Check to see if my copy of RedoRecPtr is out of date. If so, may have
> * to go back and have the caller recompute everything. This can only
> * happen just after a checkpoint, so it's better to be slow in this case
> * and fast otherwise.
> *
> * Also check to see if fullPageWrites or forcePageWrites was just turned
> * on, or if we are in the process of enabling checksums in the cluster;
> * if we weren't already doing full-page writes then go back and recompute.
> *
> * If we aren't doing full-page writes then RedoRecPtr doesn't actually
> * affect the contents of the XLOG record, so we'll update our local copy
> * but not force a recomputation. (If doPageWrites was just turned off,
> * we could recompute the record without full pages, but we choose not to
> * bother.)
> */
> if (RedoRecPtr != Insert->RedoRecPtr)
> {
> Assert(RedoRecPtr < Insert->RedoRecPtr);
> RedoRecPtr = Insert->RedoRecPtr;
> }
> doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsOnInProgress());

Why does this use DataChecksumsOnInProgress() instead of
DataChecksumsNeedWrite()? If checksums are enabled, you always need
full-page writes, don't you? If not, then why is it needed in the
inprogress-on state?

We also set doPageWrites in InitXLOGAccess(). That should match the
condition above (although it doesn't matter for correctness).

> /*
> * DataChecksumsNeedVerify
> * Returns whether data checksums must be verified or not
> *
> * Data checksums are only verified if they are fully enabled in the cluster.
> * During the "inprogress-on" and "inprogress-off" states they are only
> * updated, not verified (see datachecksumsworker.c for a longer discussion).
> *
> * This function is intended for callsites which have read data and are about
> * to perform checksum validation based on the result of this. To avoid the
> * the risk of the checksum state changing between reading and performing the
> * validation (or not), interrupts must be held off. This implies that calling
> * this function must be performed as close to the validation call as possible
> * to keep the critical section short. This is in order to protect against
> * time of check/time of use situations around data checksum validation.
> */
> bool
> DataChecksumsNeedVerify(void)
> {
> Assert(InterruptHoldoffCount > 0);
> return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION);
> }

What exactly is the intended call pattern here? Something like this?

smgrread() a data page
HOLD_INTERRUPTS();
if (DataChecksumsNeedVerify())
{
if (pg_checksum_page((char *) page, blkno) != expected)
elog(ERROR, "bad checksum");
}
RESUME_INTERRUPTS();

That seems to be what the code currently does. What good does holding
interrupts do here? If checksums were not fully enabled at the
smgrread() call, the page might have incorrect checksums, and if the
state transitions from inprogress-on to on between the smggread() call
and the DataChecksumsNeedVerify() call, you'll get an error. I think you
need to hold the interrupts *before* the smgrread() call.

> /*
> * Set checksum for a page in private memory.
> *
> * This must only be used when we know that no other process can be modifying
> * the page buffer.
> */
> void
> PageSetChecksumInplace(Page page, BlockNumber blkno)
> {
> HOLD_INTERRUPTS();
> /* If we don't need a checksum, just return */
> if (PageIsNew(page) || !DataChecksumsNeedWrite())
> {
> RESUME_INTERRUPTS();
> return;
> }
>
> ((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
> RESUME_INTERRUPTS();
> }

The checksums state might change just after this call, before the caller
has actually performed the smgrwrite() or smgrextend() call. The caller
needs to hold interrupts across this function and the
smgrwrite/smgrextend() call. It is a bad idea to HOLD_INTERRUPTS() here,
because that just masks bugs where the caller isn't holding the
interrupts. Same in PageSetChecksumCopy().

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Nancarrow 2021-02-09 08:57:06 Re: Libpq support to connect to standby server as priority
Previous Message torikoshia 2021-02-09 08:48:55 Re: adding wait_start column to pg_locks