Re: Dead Space Map version 3 (simplified)

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Dead Space Map version 3 (simplified)
Date: 2007-04-23 11:37:36
Message-ID: 462C9A80.3070003@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

ITAGAKI Takahiro wrote:
> Heikki Linnakangas <heikki(at)enterprisedb(dot)com> wrote:
>> If I'm reading the code correctly, DSM makes no attempt to keep the
>> chunks ordered by block number. If that's the case, vacuum needs to be
>> modified because it currently relies on the fact that blocks are scanned
>> and the dead tuple list is therefore populated in order.
>
> Vacuum still scans heaps in block order and picks up corresponding DSM
> chunks. Therefore the order of DSM chunks is not important. This method
> is not efficient for huge tables with small deadspaces, but I think it
> doesn't become a serious issue.

Ok, I can see now that the iterator returns pages in heap order, so no
problem there.

Looking closer at FlushBuffer: before flushing a page to disk, the page
is scanned to count the number of vacuumable tuples on it. That has the
side effect of setting all the hint bits, which is something that we've
been thinking of doing anyway (there's a TODO on that as well). It adds
some CPU overhead to writing dirty buffers, which I personally don't
believe is a problem at all, but it's worth noting. I don't believe it's
a problem because if you're system is I/O bound, the CPU overhead
doesn't really matter and if it saves any I/O later by not having to
dirty the page later to write the hint bits, the benefit definitely
outweights the cost. And if your system is CPU bound, there shouldn't be
that many FlushBuffers happening for it to matter too much.

I think we should set the bit in the DSM whenever there's any dead space
in the block, instead of having the threshold of 2 tuples or BLCKSZ/4
space. A pathological example is a relatively seldom updated table with
a fillfactor set so that there's room for exactly one update on each
page. The DSM will never have any bits set for the table, because
there's no room for more than one dead tuple on any page.

But if we don't bother with the threshold, we don't need to scan the
tuples in FlushBuffer to count the dead tuples. I think it'd still be
worth it to scan them just to set the hint bits, though, but it becomes
an orthogonal feature then.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2007-04-23 12:17:46 Re: Fragmentation project
Previous Message Florian G. Pflug 2007-04-23 11:19:42 Re: [PATCH] A crash and subsequent recovery of themaster can cause the slave to get out-of-sync

Browse pgsql-patches by date

  From Date Subject
Next Message Heikki Linnakangas 2007-04-23 13:22:43 fix LOCK_DEBUG
Previous Message Gregory Stark 2007-04-23 11:12:30 Re: Dead Space Map version 3 (simplified)