The idea of using a so called dead space map to speed up vacuum has come
up multiple times in this list in the last couple of years. I wrote an
initial implementation of it to measure the performance impact it has on
updates and on vacuum.
Potential uses for a dead space map are:
* speed up vacuum when there's few dead tuples
Vacuum will need to be modified to use index lookups to find index tuples
corresponding the dead heap tuples. Otherwise you have to scan through
all the indexes anyway.
* vacuuming pages one by one as they're written by bgwriter
I'm not sure how much difference this would make, but it would be an
interesting experiment. In theory, you could save a lot of total I/O,
because you would not need to come back to vacuum the pages later, but you
would have to read in any index pages pointing to the dead heap tuples
* implementation of index-only scans
An index scan would not have to check the visibility information of heap
tuples on those heap pages that are marked as clean in the dead space map.
This requires that the dead space map is implemented so that a page is
reliably marked as dirty in all circumstances when it contains any tuples
that are not visible to all backends.
The obvious drawback is that heap updates need to update the dead space
My current implementation stores a bitmap of 32k bits in the special space
of every 32k heap pages. Each bit in the bitmap corresponds one heap page.
The bit is set every time a tuple is updated, and it's cleared by vacuum.
This is a very simple approach, and doesn't take much space.
Is there something I'm missing? Any ideas?
I'm going to have some spare time to hack PostgreSQL in the coming
months, and I'm thinking of refining this if there's interest. Is anyone
else working on this?
pgsql-hackers by date
|Next:||From: Luke Lonergan||Date: 2006-02-27 18:02:27|
|Subject: Re: Dead Space Map|
|Previous:||From: Simon Riggs||Date: 2006-02-27 17:44:48|
|Subject: Re: Scrollable cursors and Sort performance|