Re: Dead Space Map for vacuum

From: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Dead Space Map for vacuum
Date: 2006-12-28 08:17:34
Message-ID: Pine.LNX.4.58.0612281906500.1474@linuxworld.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 28 Dec 2006, Heikki Linnakangas wrote:

> ITAGAKI Takahiro wrote:
> > Hello,
> >
> > NTT staffs are working on TODO item:
> > | Create a bitmap of pages that need vacuuming
> >
> > We call the bitmap "Dead Space Map" (DSM), that allows VACUUM to scan
> > only pages that need vacuuming or freezing. We'd like to discuss the
> > design on hackers and make agreements with community.
>
> Great!

I agree!

>
> > We implemented the basic parts of it and measured the performance.
> > As expected, VACUUM took shorter time when the fraction of updates are low.
> > DSM is useful for large but not so heavily-updated databases. The overhead
> > of managing DSM seemed to be negligible small in our CPU-loaded tests.
> >
> > There are a lot of choices in implementation. We followed the descriptions
> > in TODO list and past discussions in some parts, but did not in other parts
> > for some reasons. I would appreciate your comments and suggestions.
>
> I experimented with a different DSM design last winter. I got busy with
> other things and never posted it AFAIR, but the idea was to store a
> bitmap in the special area on every 32k heap page. That had some advantages:
>
> * doesn't require a new dedicated shared memory area that needs to be
> allocated and tuned.
> * doesn't introduce a (single) new global lwlock that might become hotspot.
> * WAL logging is quite simple, since the bitmaps are on normal pages
> handled by buffer manager.

I too have experimented with this idea. 4 DSM pages means 2 bits per
block. What were you using the other bit for? When I discussed this some
time ago with Jan Weick, we recommended we track blocks in the FSM with
this extra bit.

One problem this approach may have is contention for the DSM pages and the
granularity of the lock associated with it. Locking a DSM page to update
one bit will affect writes to 32K pages. This might be less of a problem
than I think because of the very low cost of setting the bit.

> > | In the event of a system crash, the bitmap would probably be invalidated.
> >
> > We bought it for simplicity. Currently, all DSM are lost after crash.
> > All tables will be untracked status. Full-scanned vacuum is needed
> > after the lost of DSM.
>
> If you flush the DSM to disk at checkpoint, it should be easy to bring
> it up-to-date on WAL replay. Having to do full-scanning vacuum after
> crash can be a nasty surprise.

I agree. One interesting insight is, we have 4 bits left over (those not
used for the 4 DSM blocks). We might use those status bits for some
purpose. For example, when a page is dirtied and we update the DSM page,
we set a DSM dirtied bit. Checkpoint would then only flush DSM pages
dirtied since the last check point. This is important in the presense of
lots of read only tables.

> > | One complexity is that index entries still have to be vacuumed, and doing
> > | this without an index scan (by using the heap values to find the index entry)
> > | might be slow and unreliable, especially for user-defined index functions.
> >
> > Indexes are still needed to be full-scanned at the present moment. We are
> > also researching a retail index vacuum method, but it requires more works.
>
> Yeah, that's an old story :(.
>
> BTW: Yesterday I realized that bitmap indexes could be retail vacuumed
> safely. You'll still need to visit all bitmaps to find the dead bit, but
> you only need to check the bitmap page that corresponds the tid of the
> dead tuple.

Cool. The problem is solving it for the other AMs, particularly btree.

>
> > | http://archives.postgresql.org/pgsql-hackers/2004-03/msg00957.php
> > | Maintain 2 bits per block that tell if the block has been vaccumed of all
> > | dead tuples since the last time it was dirtied, and if all its tuples are
> > | completely frozen.
> >
> > We use 1 bit per block, so we cannot separate pages that need either
> > vacuum or freeze only. The reason is that we cannot determine where to
> > record before/after updated tuples. If the transaction is commited,
> > before-version should be recorded into needs-vacuum bitmap and
> > after-version into needs-freeze bitmap. But on rollback, it is inverted.
> > We cannot judge which we should do until the end of transaction.
>
> Yeah, that makes the DSM less useful. Are you thinking of freezing more
> aggressively because of that? Or doing a full-scanning vacuum every now
> and then to freeze?

I don't see any problem with moving to two status bits per block in the
DSM-in-heap model, so we can track this better. What do you think?

Thanks,

Gavin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dhanaraj M 2006-12-28 09:17:00 Re: Allow the identifier length to be increased via a
Previous Message lakshmi dj 2006-12-28 07:35:50 Error: 1 trigger record(s) not found for relation "pg_shadow"