Re: [Patch] Checksums for SLRU files

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Ivan Kartyshov <i(dot)kartyshov(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Patch] Checksums for SLRU files
Date: 2018-08-01 22:38:43
Message-ID: CAEepm=1e91zMk-vZszCOGDtKd=DhMLQjgENRSxcbSEhxuEPpfA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 1, 2018 at 11:06 PM, Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>> 1 авг. 2018 г., в 13:49, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> написал(а):
>> Hmm. This proposal doesn't seem to deal with torn writes.
>
> That's true, but it's a bit orthogonal to problem solved with checksums.
> Checksums provide way to avoid reading bad page, torn pages - is about preventing writing bad writes.

It's a problem if you look at it like this: Without your patch, my
database can recover after power loss. With your patch, a torn SLRU
page can cause recovery to fail. Then my only option is to set
ignore_checksum_failure=on so that my cluster can start up. Without
significant effort I can't tell if the checksum verification failed
because data was arbitrarily corrupted (the reason for this feature to
exist), or because of a torn page (*expected behaviour* on
interruption of a storage system with atomic write size < BLCKSZ).
This may also apply also to online filesystem-level backups.

PostgreSQL only requires atomic writes of 512 bytes (see
PG_CONTROL_MAX_SAFE_SIZE), the traditional sector size for disks made
approximately 1980-2010, though as far as I know spinning disks made
this decade use 4KB sectors, and for SSDs there is more variation. I
suppose the theory for torn SLRU page safety today is that the
existing SLRU users all have fully independent values that don't cross
sector boundaries, so torn writes can't corrupt them.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2018-08-01 23:36:10 Re: Making "COPY partitioned_table FROM" faster
Previous Message Alvaro Herrera 2018-08-01 22:11:31 Re: patch to ensure logical decoding errors early