Re: Enabling Checksums

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Enabling Checksums
Date: 2013-04-12 20:03:48
Message-ID: 516868A4.6000403@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12.04.2013 22:31, Bruce Momjian wrote:
> On Fri, Apr 12, 2013 at 09:28:42PM +0200, Andres Freund wrote:
>>> Only point worth discussing is that this change would make backup blocks be
>>> covered by a 16-bit checksum, not the CRC-32 it is now. i.e. the record
>>> header is covered by a CRC32 but the backup blocks only by 16-bit.
>>
>> That means we will have to do the verification for this in
>> ValidXLogRecord() *not* in RestoreBkpBlock or somesuch. Otherwise we
>> won't always recognize the end of WAL correctly.
>> And I am a bit wary of reducing the likelihood of noticing the proper
>> end-of-recovery by reducing the crc width.
>>
>> Why again are we doing this now? Just to reduce the overhead of CRC
>> computation for full page writes? Or are we forseeing issues with the
>> page checksums being wrong because of non-zero data in the hole being
>> zero after the restore from bkp blocks?
>
> I thought the idea is that we were going to re-use the already-computed
> CRC checksum on the page, and we only have 16-bits of storage for that.

No, the patch has to compute the 16-bit checksum for the page when the
full-page image is added to the WAL record. There would otherwise be no
need to calculate the page checksum at that point, but only later when
the page is written out from shared buffer cache.

I think this is a bad idea. It complicates the WAL format significantly.
Simon's patch didn't include the changes to recovery to validate the
checksum, but I suspect it would be complicated. And it reduces the
error-detection capability of WAL recovery. Keep in mind that unlike
page checksums, which are never expected to fail, so even if we miss a
few errors it's still better than nothing, the WAL checkum is used to
detect end-of-WAL. There is expected to be a failure every time we do
crash recovery. This far, we've considered the probability of one in
1^32 small enough for that purpose, but IMHO one in 1^16 is much too weak.

If you want to speed up the CRC calculation of full-page images, you
could have an optimized version of the WAL CRC algorithm, using e.g.
SIMD instructions. Because typical WAL records are small, max 100-200
bytes, and it consists of several even smaller chunks, the normal WAL
CRC calculation is quite resistant to common optimization techniques.
But it might work for the full-page images. Let's not conflate it with
the page checksums, though.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2013-04-12 20:42:42 Re: Enabling Checksums
Previous Message Heikki Linnakangas 2013-04-12 19:49:51 Small reduction in memory usage of index relcache entries