Re: Proposal: Another attempt at vacuum improvements

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Another attempt at vacuum improvements
Date: 2011-06-08 23:52:41
Message-ID: BANLkTi=-MCYvK=dmqyCvm6N-g6br3Dberw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 8, 2011 at 1:19 AM, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:
> I went on to create a WIP patch based on our discussion. There are
> couple of issues that I stumbled upon while testing it.
>
> 1. The start-of-index-vacuum LSN that we want to track must be noted
> even before the heap scan is started. This is because we must be
> absolutely sure that the index vacuum removes index pointers to all
> dead line pointers generated by any operation with LSN less than the
> start-of-index-vacuum LSN. If we don't remember the LSN before heap
> scan starts and rather delay it until the start of the index vacuum,
> new dead line pointers may get generated on a page which is already
> scanned by the heap scan but before the start of the index scan. Since
> the index pointers to these new dead line pointers haven't been
> vacuumed, we should really not be removing them.
>
> But as a consequence of using a LSN from the start of the heap scan,
> at the end of vacuum, all pruned pages will have vacuum LSN greater
> than the index vacuum LSN that we are going to remember in the
> pg_class. And by our design, we can't remove dead line pointers on
> those pages because we don't know if the index pointers have been
> vacuumed or not. We might not be able to reclaim any dead line
> pointers, if the page is again HOT pruned before the next vacuum cycle
> because that will overwrite the page vacuum LSN with a newer value.

Oh. That sucks.

> I think we definitely need to track the dead line pointers that a heap
> scan has collected. The index pointers to them will be removed if the
> vacuum completes successfully. That gets us back to the original idea
> that we had discussed a while back about marking such dead line
> pointers as LP_DEAD_RECLAIMED  or something like that. When vacuum
> runs heap scan, it would collect all dead line pointers and mark them
> dead-reclaimed and also store an identifier of the vacuum operation
> that would remove the associated index pointers. During HOT cleanup or
> the next vacuum, we can safely remove the LP_DEAD_RECLAIMED line
> pointers if we can safely check if the vacuum completed successfully
> or not.  We don't have any free flags in ItemIdData, but we can use
> special lp_off to recognize a dead and dead-reclaimed line pointer.
> The identifier itself can either be an LSN or XID or anything else.
> Also, since we just need one identifier, I think this technique would
> work for unlogged and temp relations, with little adjustments.

OK. So we have a Boolean some place. At the beginning of VACUUM, we
read and remember the old value, and set it to false. At the end of
VACUUM, after everything has succeeded, we set it to true. During HOT
cleanup, we can free dead-reclaimed line pointers if the value is
currently true. During VACUUM, we can free dead-reclaimed line
pointers if the value was true when we started.

The name dead-reclaimed doesn't inspire me very much. Dead vs.
dead-vacuumed? Morbid vs. dead?

> 2. Another issue is with analyze counting dead line pointers as dead
> rows. While its correct in principle because a vacuum is needed to
> remove these dead line pointers, the overhead of having a dead line
> pointer is much lesser than a dead tuple. Also, with single pass
> vacuum, there will be many dead line pointers waiting to be cleaned up
> in the next vacuum or HOT-prune. We should not really count them as
> dead rows because they don't require a vacuum per se and counting them
> as dead will force more vacuum cycles than required. If we go by the
> idea described above, we can definitely skip the dead-reclaimed line
> pointers, definitely when we know that index vacuum was completed
> successfully.
>
> Thoughts ?

I think we should count both the dead line pointers and dead tuples
separately, but have two separate counters. I agree that a dead line
pointer is a lot less expensive than a dead tuple, but it's not free
either.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dan Ports 2011-06-09 00:04:52 Re: SSI work for 9.1
Previous Message Robert Haas 2011-06-08 23:35:40 Re: tuning autovacuum