Re: Block-level CRC checks

From: Aidan Van Dyk <aidan(at)highrise(dot)ca>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, marcin mank <marcin(dot)mank(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: Block-level CRC checks
Date: 2009-12-01 14:26:21
Message-ID: 20091201142620.GA15507@oak.highrise.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

* Andres Freund <andres(at)anarazel(dot)de> [091201 08:42]:
> On Tuesday 01 December 2009 14:38:26 marcin mank wrote:
> > On Mon, Nov 30, 2009 at 9:27 PM, Heikki Linnakangas
> >
> > <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> > > Simon Riggs wrote:
> > >> Proposal
> > >>
> > >> * We reserve enough space on a disk block for a CRC check. When a dirty
> > >> block is written to disk we calculate and annotate the CRC value, though
> > >> this is *not* WAL logged.
> > >
> > > Imagine this:
> > > 1. A hint bit is set. It is not WAL-logged, but the page is dirtied.
> > > 2. The buffer is flushed out of the buffer cache to the OS. A new CRC is
> > > calculated and stored on the page.
> > > 3. Half of the page is flushed to disk (aka torn page problem). The CRC
> > > made it to disk but the flipped hint bit didn't.
> > >
> > > You now have a page with incorrect CRC on disk.
> >
> > What if we treated the hint bits as all-zeros for the purpose of CRC
> > calculation? This would exclude them from the checksum.
> That sounds like doing a complete copy of the wal page zeroing specific fields
> and then doing wal - rather expensive I would say. Both, during computing the
> checksum and checking it...

No, it has nothing to do with WAL, it has to do with when writing
"pages" out... You already double-buffer them (to avoid the page
changing while you checksum it) before calling write, but the code
writing (and then reading) pages doesn't currently have to know all the
internal "stuff" needed decide what's a hint bit and what's not...

And adding that information into the buffer in/out would be a huge wart
on the modularity of the PG code...

a.

--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2009-12-01 14:26:22 Re: Block-level CRC checks
Previous Message Euler Taveira de Oliveira 2009-12-01 14:03:49 Re: Feature request: permissions change history for auditing