Visibility map thoughts

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Visibility map thoughts
Date: 2007-11-05 09:52:54
Message-ID: 472EE7F6.4070005@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Though 8.3 isn't out of the oven just yet, I've been thinking about the
dead space map a lot, and decided I have to start writing down those
thoughts.

Reducing VACUUM time is important, but the real big promise is the
ability to do index-only-scans. Because that's the main focus of this
exercise, I'm calling it the the Visibility Map from now on, because
it's not about tracking dead space, but tuple visibility in general.
Don't worry, reduced VACUUM times on read-mostly tables with hot spots
will still fall out of it.

What to store in the Visibility Map?
------------------------------------

Each potential user of the visibility map needs slightly different
information:

a) VACUUM needs to know if a page has enough dead tuples that it's
interesting to visit.
b) VACUUM FREEZE needs to know if a page has any non-frozen tuples that
can be frozen.
c) Index-only-scans needs to know if all tuples on a page are visible to
all transactions. HOT updates are OK, though.

From this point on, this document concentrates on a map suited for
index-only-scans.

The basic structure is a bitmap with one bit per heap page. A set bit
means "all tuples on heap page are visible to all transactions". A
cleared bit means that we don't know if that's true or not.

This kind of map is useful for VACUUM as well, though VACUUM could get
away with less strict semantics, and we might not want to visit a page
in VACUUM if there's only very few dead tuples on a page. What this
implementation gives us is a conservative VACUUM that will always find
all the same dead tuples as a regular VACUUM (except for HOT updated
tuples which can be pruned without scanning indexes). VACUUM using the
visibility map can't update stats, so we'd still want regular vacuums
every now and then, or rely on ANALYZE to do that.

It's not useful for VACUUM FREEZE, unless we're willing to freeze much
more aggressively, and change the meaning of a set bit to "all tuples on
heap page are frozen".

Other kinds of visibility maps could be useful in some narrow scenarios.
For example, if we had a map of pages that have no live tuples, we could
skip those pages in sequential scans. We could also use more than one
bit per page to store more information, but let's keep it simple for now.

Where to store the visibility map?
----------------------------------
a) In a fixed size shared memory struct. Not acceptable.

b) In the special area of every nth heap page. I played a bit with this
back in February 2006, because it's simple and requires few changes.

c) As a new relation kind, like toast tables.

d) As an auxiliary smgr relation, attached to the heap.

I'm leaning towards D at the moment. It requires a little bit of changes
to the buffer manager API and elsewhere, but not too much.

B is not good because we might want to have other kinds of auxiliary
files in the future (FSM in particular). And it is a bit hacky anyway.
Let's do something more generic.

C doesn't seem like the right approach, because a visibility map doesn't
really have much in common with "real" relations.

In any case, the bitmap is divided into pages, and the pages live in
shared_buffers, and are managed by the buffer manager like any other page.

Updating the visibility map
---------------------------

A single page in the visibility map covers a wide range of heap pages
((8192 - page header) * 8 = ~65000 heap pages). It can easily become
locking bottleneck, if every update on any page in that range needs to
lock the visibility map page.

Setting a bit is just a hint. It's ok to lose it on crash. However,
setting a bit mustn't hit the disk too early. What might otherwise
happen is that the change that made all tuples on a page visible, like
committing an inserting transaction, isn't replayed after crash, but the
set bit is already on disk. In other words, setting a bit must set the
LSN of the visibility map page.

Clearing a bit is the opposite. Clearing a bit mustn't be lost, but it's
ok if it hits the disk before the operation that cleared it. Therefore
we don't need to set the LSN, but it must be redone at WAL replay of
heap insert/update/delete.

Because a concurrent update on the same word might be lost if we didn't
lock the page at all, we need to lock the visibility map page to set or
clear a bit. Since these are quick operations, especially clearing a
bit, perhaps we could use just the buffer header spinlock.

To further reduce contention, we can have a copy of the bit in the page
header of the heap page itself. That way we'd only need to access the
visibility map on the first update on a page that clears a bit.

When to update the visibility map?
----------------------------------
- Clear bit on every update/insert/delete.
- Set bit in vacuum or prune, if all tuples on page are now visible to
all transactions.
- If prune sees that all tuples are LIVE or INSERT_IN_PROGRESS, we can't
set the bit yet, but as soon as the maximum xmin on the page <
OldestXmin, we can. Call PageSetPrunable accordingly, to prune again
after that happens.
- We don't need to clear the bit on HOT updates, because by definition
none of the indexed columns changed.

Next we need to figure out how to actually do the index-only-scans,
using the visibility map. And how to use it for VACUUMs; Itagaki's dsm
patch addressed that, and assuming it does what we want, that part of
the patch is still usable.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2007-11-05 10:02:27 Re: Fwd: Clarification about HOT
Previous Message Gokulakannan Somasundaram 2007-11-05 09:48:00 Fwd: Clarification about HOT