Re: Enabling Checksums

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <stark(at)mit(dot)edu>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Enabling Checksums
Date: 2013-04-16 19:45:41
Message-ID: CA+U5nMKnQFgkgNFbOBAe58PiPD_HnEc5GpGv9KZJqRE3n94Dpg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 16 April 2013 20:27, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Greg Stark <stark(at)mit(dot)edu> writes:
>> On Fri, Apr 12, 2013 at 9:42 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> * WAL checksum is not used as the sole basis for end-of-WAL discovery.
>>> We reuse the WAL files, so the prev field in each WAL record shows
>>> what the previous end of WAL was. Hence if the WAL checksums give a
>>> false positive we still have a double check that the data really is
>>> wrong. It's unbelievable that you'd get a false positive and then have
>>> the prev field match as well, even though it was the genuine
>>> end-of-WAL.
>
>> This is kind of true and kind of not true. If a system loses power
>> while writing lots of data to WAL then the blocks at the end of the
>> WAL might not be written out in order. Everything since the last log
>> sync might be partly written out and partly not written out. That's
>> the case where the checksum is critical. The beginning of a record
>> could easily be written out including xl_prev and the end of the
>> record not written. 1/64,000 power losses would then end up with an
>> assertion failure or corrupt database.
>
> I have a hard time believing that it's a good idea to add checksums to
> data pages and at the same time weaken our ability to detect WAL
> corruption. So this seems to me to be going in the wrong direction.
> What's it buying for us anyway? A few CPU cycles saved during WAL
> generation? That's probably swamped by the other costs of writing WAL,
> especially if you're using replication.

This part of the thread is dead now .... I said "lets drop this idea"
on 13 April.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Florian Pflug 2013-04-16 20:20:26 Re: Enabling Checksums
Previous Message Tom Lane 2013-04-16 19:27:46 Re: Enabling Checksums