Re: Online checksums verification in the backend

From: Andres Freund <andres(at)anarazel(dot)de>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Julien Rouhaud <rjuju123(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: Online checksums verification in the backend
Date: 2020-10-29 18:17:29
Message-ID: 20201029181729.2nrub47u7yqncsv7@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-10-28 14:08:52 +0900, Michael Paquier wrote:
> Thanks for confirming. I have gone through the whole set today,
> splitted the thing into two commits and applied them. We had
> buildfarm member florican complain about a mistake in one of the
> GetDatum() calls that I took care of already, and there is nothing
> else on my radar.

The code does IO while holding the buffer mapping lock. That seems
*entirely* unacceptable to me. That basically locks 1/128 of shared
buffers against concurrent mapping changes, while reading data that is
likely not to be on disk? Seriously?

/* see if the block is in the buffer pool or not */
LWLockAcquire(partLock, LW_SHARED);
buf_id = BufTableLookup(&buf_tag, buf_hash);
if (buf_id >= 0)
{
uint32 buf_state;

/*
* Found it. Now, retrieve its state to know what to do with it, and
* release the pin immediately. We do so to limit overhead as much as
* possible. We keep the shared LWLock on the target buffer mapping
* partition for now, so this buffer cannot be evicted, and we acquire
* an I/O Lock on the buffer as we may need to read its contents from
* disk.
*/
bufdesc = GetBufferDescriptor(buf_id);

LWLockAcquire(BufferDescriptorGetIOLock(bufdesc), LW_SHARED);
buf_state = LockBufHdr(bufdesc);
UnlockBufHdr(bufdesc, buf_state);

/* If the page is dirty or invalid, skip it */
if ((buf_state & BM_DIRTY) != 0 || (buf_state & BM_TAG_VALID) == 0)
{
LWLockRelease(BufferDescriptorGetIOLock(bufdesc));
LWLockRelease(partLock);
return true;
}

/* Read the buffer from disk, with the I/O lock still held */
smgrread(smgr, forknum, blkno, buffer);
LWLockRelease(BufferDescriptorGetIOLock(bufdesc));
}
else
{
/*
* Simply read the buffer. There's no risk of modification on it as
* we are holding the buffer pool partition mapping lock.
*/
smgrread(smgr, forknum, blkno, buffer);
}

The justification in the in-shared-buffers case seems to completely
mis-judge costs too:
* Found it. Now, retrieve its state to know what to do with it, and
* release the pin immediately. We do so to limit overhead as much as
* possible. We keep the shared LWLock on the target buffer mapping
* partition for now, so this buffer cannot be evicted, and we acquire
* an I/O Lock on the buffer as we may need to read its contents from
* disk.
a pin is cheap. Holding the partition lock is not.

Also, using char[BLCKSZ] as a buffer isn't ok. This should use
PGAlignedBlock:
/*
* Use this, not "char buf[BLCKSZ]", to declare a field or local variable
* holding a page buffer, if that page might be accessed as a page and not
* just a string of bytes. Otherwise the variable might be under-aligned,
* causing problems on alignment-picky hardware. (In some places, we use
* this to declare buffers even though we only pass them to read() and
* write(), because copying to/from aligned buffers is usually faster than
* using unaligned buffers.) We include both "double" and "int64" in the
* union to ensure that the compiler knows the value must be MAXALIGN'ed
* (cf. configure's computation of MAXIMUM_ALIGNOF).
*/
typedef union PGAlignedBlock

I think this needs to be quickly reworked or reverted.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-10-29 18:31:06 Re: Online checksums verification in the backend
Previous Message Stephen Frost 2020-10-29 18:05:55 Re: Autovacuum worker doesn't immediately exit on postmaster death