Re: crash-safe visibility map, take three

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: crash-safe visibility map, take three
Date: 2010-11-30 07:34:04
Message-ID: 4CF4A8EC.2070408@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 30.11.2010 06:57, Robert Haas wrote:
> I can't say I'm totally in love with any of these designs. Anyone
> else have any ideas, or any opinions about which one is best?

Well, the design I've been pondering goes like this:

At vacuum:

1. Write an "intent" XLOG record listing a chunk of visibility map bits
that are not currently set, that we are going to try to set. A chunk of
say 100 bits would be about right.

2. Scan the 100 heap pages as we currently do, setting the visibility
map bits as we go.

3. After the scan, lock the visibility map page, check which of the bits
that we set in step 2 are still set (concurrent updates might've cleared
some), and write a final XLOG record listing the set bits. This step
isn't necessary for correctness, BTW, but without it you lose all the
set bits if you crash before next checkpoint.

At replay, when we see the intent XLOG record, clear all the bits listed
in it. This ensures that if we crashed and some of the visibility map
bits were flushed to disk but the corresponding changes to the heap
pages were not, the bits are cleared. When we see the final XLOG record,
we set the bits.

Some care is needed with checkpoints. Setting visibility map bits in
step 2 is safe because crash recovery will replay the intent XLOG record
and clear any incorrectly set bits. But if a checkpoint has happened
after the intent XLOG record was written, that's not true. This can be
avoided by checking RedoRecPtr in step 2, and writing a new intent XLOG
record if it has changed since the last intent XLOG record was written.

There's a small race condition in the way a visibility map bit is
currently cleared. When a heap page is updated, it is locked, the update
is WAL-logged, and the lock is released. The visibility map page is
updated only after that. If the final vacuum XLOG record is written just
after updating the heap page, but before the visibility map bit is
cleared, replaying the final XLOG record will set a bit that should not
have been set.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Itagaki Takahiro 2010-11-30 08:48:04 Re: Tab completion for view triggers in psql
Previous Message Andres Freund 2010-11-30 07:21:29 Re: profiling connection overhead