Re: Proposal: Another attempt at vacuum improvements

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Another attempt at vacuum improvements
Date: 2011-05-25 13:14:22
Message-ID: BANLkTi=Arq+vFwmFO9v7JOdcgFdJYi0UeQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 25, 2011 at 1:27 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

>> At the moment we scan indexes if we have > 0 rows to remove, which is
>> probably wasteful. Perhaps it would be better to keep a running total
>> of rows to remove, by updating pg_stats, then when we hit a certain
>> threshold in total we can do the index scan. So we don't need to
>> remember the TIDs, just remember how many there were and use that to
>> avoid cleaning too vigorously.
>
> That occurred to me, too.  If we're being launched by autovacuum then
> we know that a number of updates and deletes equal ~20% (or whatever
> autovacuum_vacuum_scale_factor is set to) of the table size have
> occurred since the last autovacuum.  But it's possible that many of
> those were HOT updates, in which case the number of index entries to
> be cleaned up might be much less than 20% of the table size.
> Alternatively, it's possible that we'd be better off vacuuming the
> table more often (say, autovacuum_vacuum_scale_factor=0.10 or 0.08 or
> something) but only doing the index scans every once in a while when
> enough dead line pointers have accumulated.  After all, it's the first
> heap pass that frees up most of the space; cleaning dead line pointers
> seems a bit less urgent.  But not having done any real analysis of how
> this would work out in practice, I'm not sure whether it's a good idea
> or not.

We know whether a TID was once in the index or not, so we can keep an
exact count. HOT doesn't come into it.

We can remove TIDs from index as well without VACUUM during btree
split avoidance. We can optimise the second scan by skipping htids no
longer present in the index, though we'd need a spare bit to mark
usage that which I'm not sure we have.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Leonardo Francalanci 2011-05-25 13:43:58 Re: use less space in xl_xact_commit patch
Previous Message Simon Riggs 2011-05-25 13:05:39 Re: tackling full page writes