Re: Setting visibility map in VACUUM's second phase

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting visibility map in VACUUM's second phase
Date: 2012-12-06 18:01:33
Message-ID: 6230.1354816893@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> writes:
> So the idea that the patch implements is this. When we scan pages in
> the first phase of vacuum, if we find a page that has all-visible
> tuples but also has one or more dead tuples that we know the second
> phase of vacuum will remove, we mark such page with a special flag
> called PD_ALL_VISIBLE_OR_DEAD (I'm not married to the name, so feel
> free to suggest a better name). What this flag tells the second phase
> of vacuum is to mark the page all-visible and set the VM bit unless
> some intermediate action on the page again inserts a not-all-visible
> tuple. If such an action happens, the PD_ALL_VISIBLE_OR_DEAD flag will
> be cleared by that backend and the second phase of vacuum will not set
> all-visible flag and VM bit.

This seems overly complicated, as well as a poor use of a precious page
header flag bit. Why not just recheck all-visibility status after
performing the deletions? Keep in mind that even if there were some
not-all-visible tuples when we were on the page the first time, they
might have become all-visible by the time we return. So this is going
out of its way to produce a less-than-optimal result.

> The patch itself is quite small and works as intended. One thing that
> demanded special handling is the fact that visibilitymap_set()
> requires a cutoff xid to tell the Hot Standby to resolve conflicts
> appropriately. We could have scanned the page again during the second
> phase to compute the cutoff xid, but I thought we can overload the
> pd_prune_xid field in the page header to store this information which
> is already computed in the first phase.

And this is *far* too cute. The use of that field is complicated enough
already without having its meaning vary depending on randomly-designed
flag bits. And it's by no means obvious that using a by-now-old value
when we return to the page is a good idea anyway (or even correct).

I think taking a second whack at setting the visibility bit is a fine
idea, but let's drop all the rest of this premature optimization.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2012-12-06 18:08:51 Re: How to check whether the row was modified by this transaction before?
Previous Message Kevin Grittner 2012-12-06 18:00:08 Re: Functional dependency in GROUP BY through JOINs