Re: Enabling Checksums

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Enabling Checksums
Date: 2013-04-12 20:42:42
Message-ID: CA+U5nM+jvGMatxp0LHE90RAjCSwjGU7KzMs0uMjEFWMEkS6qag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12 April 2013 21:03, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:

> No, the patch has to compute the 16-bit checksum for the page when the
> full-page image is added to the WAL record. There would otherwise be no need
> to calculate the page checksum at that point, but only later when the page
> is written out from shared buffer cache.
>
> I think this is a bad idea. It complicates the WAL format significantly.
> Simon's patch didn't include the changes to recovery to validate the
> checksum, but I suspect it would be complicated. And it reduces the
> error-detection capability of WAL recovery. Keep in mind that unlike page
> checksums, which are never expected to fail, so even if we miss a few errors
> it's still better than nothing, the WAL checkum is used to detect
> end-of-WAL. There is expected to be a failure every time we do crash
> recovery. This far, we've considered the probability of one in 1^32 small
> enough for that purpose, but IMHO one in 1^16 is much too weak.
>
> If you want to speed up the CRC calculation of full-page images, you could
> have an optimized version of the WAL CRC algorithm, using e.g. SIMD
> instructions. Because typical WAL records are small, max 100-200 bytes, and
> it consists of several even smaller chunks, the normal WAL CRC calculation
> is quite resistant to common optimization techniques. But it might work for
> the full-page images. Let's not conflate it with the page checksums, though.

I accept the general tone of that as a reasonable perspective and in
many ways am on the fence myself. This is sensitive stuff.

A few points
* The code to validate the checksum isn't complex, though it is more
than the current one line. Lets say about 10 lines of clear code. I'll
work on that to show its true. I don't see that as a point of
objection.

* WAL checksum is not used as the sole basis for end-of-WAL discovery.
We reuse the WAL files, so the prev field in each WAL record shows
what the previous end of WAL was. Hence if the WAL checksums give a
false positive we still have a double check that the data really is
wrong. It's unbelievable that you'd get a false positive and then have
the prev field match as well, even though it was the genuine
end-of-WAL.

Yes, we could also have a second SIMD calculation optimised for WAL
CRC32 on an 8192 byte block, rather than just one set of SIMD code for
both. We could also have a single set of SIMD code producing a 32-bit
checksum, then take the low 16 bits as we do currently.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2013-04-12 21:02:37 Re: Enabling Checksums
Previous Message Heikki Linnakangas 2013-04-12 20:03:48 Re: Enabling Checksums