From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
Cc: | "Jesper(at)Krogh(dot)cc" <jesper(at)krogh(dot)cc>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Crash safe visibility map vs hint bits |
Date: | 2010-12-15 14:54:42 |
Message-ID: | 201012151454.oBFEsgw21144@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Heikki Linnakangas wrote:
> On 04.12.2010 09:14, Jesper(at)Krogh(dot)cc wrote:
> > There has been a lot discussion about index-only scans and how to make the visibillity map crash safe. Then followed by a good discussion about hint bits.
> >
> > What seems to be the main concern is the added wal volume and it makes me wonder if there is a way in-between that looks more like hint bits.
> >
> > How about lazily wal-log the complete visibility map say every X minutes or N amount of tuple updates and make the wal recovery jobs of rechecking visibility of pages touched by the wal stream on recovery.
>
> If you WAL-log the visibility map changes after-the-fact, it doesn't
> solve the race condition we're struggling with: the visibility map
> change might hit the disk before the PD_ALL_VISIBLE to the heap page. If
> you crash, you can end up with a situation where the PD_ALL_VISIBLE flag
> on the heap page is not set, but the bit in the visibility map is. Which
> causes serious issues later on.
Based on hacker emails and a discussion I had with Heikki while we were
in Germany, I have updated the index-only scans wiki to document a known
solution to making the visibility map crash-safe for use by index-only
scan use:
http://wiki.postgresql.org/wiki/Index-only_scans#Making_the_Visibility_Map_Crash-Safe
Making the Visibility Map Crash-Safe
Currently, a heap page that has all-visible tuples is marked by vacuum
as PD_ALL_VISIBLE and the visibility map (VM) bit is set. This is
currently unlogged, and a crash could require these to be set again.
The complexity is that for index-only scans, the VM bit has meaning, and
cannot be incorrectly set (though it can be incorrectly cleared because
that would just result in additional heap access). If both
PD_ALL_VISIBLE and the VM bit were to be set, and a crash resulted the
VM bit being written to disk, but not the PD_ALL_VISIBLE bit, a later
heap access that wrote a conditionally-visible row would not know to
clear the VM bit, causing incorrect results for index-only scans.
The solution is to WAL log the VM set bit activity. This will cause
full-page writes for the VM page, but this is much less than WAL-logging
each heap page because a VM page represents many heap pages. This
requires that the VM page not be written to disk until its VM-set WAL
record is fsynced to disk. Also, during crash recovering, reading the
VM-set WAL record would cause both the VM-set and heap PD_ALL_VISIBLE to
be set.
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
From | Date | Subject | |
---|---|---|---|
Next Message | Jan Urbański | 2010-12-15 14:55:01 | Re: hstores in pl/python |
Previous Message | Tom Lane | 2010-12-15 14:53:09 | Re: hstores in pl/python |