crash-safe visibility map, take four

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: crash-safe visibility map, take four
Date: 2011-03-22 20:43:00
Message-ID: AANLkTino+UehSpXz+-ZP4Q58mSjW_JHx7Q4GUiPfRbC2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 1, 2010 at 11:25 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> As far as I can tell, there are basically two viable solutions on the
> table here.
>
> 1. Every time we observe a page as all-visible, (a) set the
> PD_ALL_VISIBLE bit on the page, without bumping the LSN; (b) set the
> bit in the visibility map page, bumping the LSN as usual, and (c) emit
> a WAL record indicating the relation and block number.  On redo of
> this record, set both the page-level bit and the visibility map bit.
> The heap page may hit the disk before the WAL record, but that's OK;
> it just might result in a little extra work until some subsequent
> operation gets the visibility map bit set.  The visibility map page
> page may hit the disk before the heap page, but that's OK too, because
> the WAL record will already be on disk due to the LSN interlock.  If a
> crash occurs before the heap page is flushed, redo will fix the heap
> page.  (The heap page will get flushed as part of the next checkpoint,
> if not sooner, so by the time the redo pointer advances past the WAL
> record, there's no longer a risk.)
>
> 2. Every time we observe a page as all-visible, (a) set the
> PD_ALL_VISIBLE bit on the page, without bumping the LSN, (b) set the
> bit in the visibility map page, bumping the LSN if a WAL record is
> issued (which only happens sometimes, read on), and (c) emit a WAL
> record indicating the "chunk" of 128 visibility map bits which
> contains the bit we just set - but only if we're now dealing with a
> new group of 128 visibility map bits or if a checkpoint has intervened
> since the last such record we emitted.  On redo of this record, clear
> the visibility map bits in each chunk.  The heap page may hit the disk
> before the WAL record, but that's OK for the same reasons as in plan
> #1.  The visibility map page may hit the disk before the heap page,
> but that's OK too, because the WAL record will already be on disk to
> due the LSN interlock.  If a crash occurs before the heap page makes
> it to disk, then redo will clear the visibility map bits, leaving them
> to be reset by a subsequent VACUUM.

I took a crack at implementing the first approach described above,
which seems to be by far the simplest idea we've come up with to date.
Patch attached. It doesn't seem to be that complicated, which could
mean either that it's not that complicated or that I'm missing
something. Feel free to point and snicker in the latter case.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
visibility-map-v1.patch application/octet-stream 12.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message erdinc.akkaya 2011-03-22 21:16:09 Re: GSoC 2011 - Mentors? Projects?
Previous Message Robert Haas 2011-03-22 20:33:11 Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.