Re: 16-bit page checksums for 9.2

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, david(at)fetter(dot)org, aidan(at)highrise(dot)ca, stark(at)mit(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 16-bit page checksums for 9.2
Date: 2012-01-06 20:03:49
Message-ID: 201201062103.50231.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Friday, January 06, 2012 08:53:38 PM Robert Haas wrote:
> On Fri, Jan 6, 2012 at 2:48 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On Friday, January 06, 2012 08:45:45 PM Heikki Linnakangas wrote:
> >> On 06.01.2012 20:26, Simon Riggs wrote:
> >> > The following patch (v4) introduces a new WAL record type that writes
> >> > backup blocks for the first hint on a block in any checkpoint that has
> >> > not previously been changed. IMHO this fixes the torn page problem
> >> > correctly, though at some additional loss of performance but not the
> >> > total catastrophe some people had imagined. Specifically we don't need
> >> > to log anywhere near 100% of hint bit settings, much more like 20-30%
> >> > (estimated not measured).
> >>
> >> How's that going to work during recovery? Like in hot standby.
> >
> > How's recovery a problem? Unless I miss something that doesn't actually
> > introduce a new possibility to transport hint bits to the standby (think
> > fpw's). A new transport will obviously increase traffic but ...
>
> The standby can set hint bits locally that weren't set on the data it
> received from the master. This will require rechecksumming and
> rewriting the page, but obviously we can't write the WAL records
> needed to protect those writes during recovery. So a crash could
> create a torn page, invalidating the checksum.
Err. Stupid me, thanks.

> Ignoring checksum errors during Hot Standby operation doesn't fix it,
> either, because eventually you might want to promote the standby, and
> the checksum will still be invalid.
Its funny. I have the feeling we all are missing a very obvious brilliant
solution to this...

Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2012-01-06 20:16:35 Re: Collect frequency statistics for arrays
Previous Message Robert Haas 2012-01-06 19:53:38 Re: 16-bit page checksums for 9.2