Re: [Patch] Checksums for SLRU files

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Ivan Kartyshov <i(dot)kartyshov(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Patch] Checksums for SLRU files
Date: 2018-08-01 10:49:21
Message-ID: CAEepm=17aJiLR=bMeLN8ZFYzQeoHqhg=Zmyf4OpAhmfsi=jvUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 18, 2018 at 5:54 PM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Wed, Jul 18, 2018 at 5:41 PM, Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>>> I think we'd want pg_upgrade tests showing an example of each SLRU
>>> growing past one segment, and then being upgraded, and then being
>>> accessed in various different pages and segment files, so that we can
>>> see that we're able to shift the data to the right place successfully.
>>> For example I think I'd want to see that a single aborted transaction
>>> surrounded by many committed transactions shows up in the right place
>>> after an upgrade.
>>
>> Can you elaborate a bit on how to implement this test. I've searched for some automated pg_upgrade tests but didn't found one.
>> Should it be one-time test script or something "make check-world"-able?

Hmm. This proposal doesn't seem to deal with torn writes. If someone
modifies an 8KB SLRU page and it is partially written out (say,
because your disk has 4KB sectors, and the power cuts out after only
one sector is modified), then during recovery we'll try to read that
block back in and the checksum will be wrong.

The way PostgreSQL usually deals with this problem (and higher level
problems caused by torn writes) is by putting a full page image into
the WAL the first time each page is modified after each checkpoint.
(There are other approaches used by other databases, such as MySQL's
write-two-copies strategy with a barrier in between, since they can't
both be torn, and the problem goes away if your filesystem somehow
magically provides atomic 8KB blocks so you can turn full page writes
off.)

To reuse the existing machinery, in theory I think you'd call
XLogRegisterBuffer() in every place that modifies an SLRU page, and
PageSetLSN() after inserting the WAL. The problem is that these pages
are not in the regular buffer pool and don't have an LSN in the
standard place, so that won't work. I heard about a project to put
SLRUs into the regular buffer pool, but I don't know the status.
Without that I think you might need to invent equivalent machinery
that can register SLRU buffers with xloginsert.c.

To avoid writing full page images for every SLRU page, you'd probably
want to use something like REGBUF_WILL_INIT to skip FPW for pages
you're zero-initialising (eg in ZeroCLOGPage()). I haven't studied
the synchronisation problems lurking there.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message ahsan hadi 2018-08-01 10:55:11 Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack
Previous Message Sergei Kornilov 2018-08-01 10:16:58 Re: Online enabling of checksums