From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Aidan Van Dyk <aidan(at)highrise(dot)ca> |
Cc: | pgsql-hackers(at)postgresql(dot)org, marcin mank <marcin(dot)mank(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com> |
Subject: | Re: Block-level CRC checks |
Date: | 2009-12-01 14:38:41 |
Message-ID: | 200912011538.41532.andres@anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tuesday 01 December 2009 15:26:21 Aidan Van Dyk wrote:
> * Andres Freund <andres(at)anarazel(dot)de> [091201 08:42]:
> > On Tuesday 01 December 2009 14:38:26 marcin mank wrote:
> > > On Mon, Nov 30, 2009 at 9:27 PM, Heikki Linnakangas
> > >
> > > <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> > > > Simon Riggs wrote:
> > > >> Proposal
> > > >>
> > > >> * We reserve enough space on a disk block for a CRC check. When a
> > > >> dirty block is written to disk we calculate and annotate the CRC
> > > >> value, though this is *not* WAL logged.
> > > >
> > > > Imagine this:
> > > > 1. A hint bit is set. It is not WAL-logged, but the page is dirtied.
> > > > 2. The buffer is flushed out of the buffer cache to the OS. A new CRC
> > > > is calculated and stored on the page.
> > > > 3. Half of the page is flushed to disk (aka torn page problem). The
> > > > CRC made it to disk but the flipped hint bit didn't.
> > > >
> > > > You now have a page with incorrect CRC on disk.
> > >
> > > What if we treated the hint bits as all-zeros for the purpose of CRC
> > > calculation? This would exclude them from the checksum.
> >
> > That sounds like doing a complete copy of the wal page zeroing specific
> > fields and then doing wal - rather expensive I would say. Both, during
> > computing the checksum and checking it...
> No, it has nothing to do with WAL, it has to do with when writing
> "pages" out... You already double-buffer them (to avoid the page
> changing while you checksum it) before calling write, but the code
> writing (and then reading) pages doesn't currently have to know all the
> internal "stuff" needed decide what's a hint bit and what's not...
err, yes. That "WAL" slipped in, sorry. But it would still either mean a third
copy of the page or a rather complex jumping around on the page...
Andres
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2009-12-01 14:40:53 | Re: Block-level CRC checks |
Previous Message | Tom Lane | 2009-12-01 14:35:25 | Re: CommitFest status/management |