Re: Still recommending daily vacuum...

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>, Michael Paesold <mpaesold(at)gmx(dot)at>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Still recommending daily vacuum...
Date: 2007-07-04 22:10:52
Message-ID: 20070704221052.GL5035@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Gregory Stark wrote:
>
> "Alvaro Herrera" <alvherre(at)commandprompt(dot)com> writes:
>
> >> FWIW, I normally go with the 8.2 defaults, though I could see dropping
> >> vacuum_scale_factor down to 0.1 or 0.15. I also think the thresholds
> >> could be decreased further, maybe divide by 10.
> >
> > How about pushing thresholds all the way down to 0?
>
> My intuition is that the thresholds should be lowered to about 5%.
>
> I may be biased by the TPC-C schema where the largest table, stock, gets a
> little over 20 records per page so 5% represents an average of one update per
> page. But there's nothing unusual about a table like that. Waiting until 20%
> of the table is potentially dead --four dead tuples out of 20 per page in the
> stock table case-- seems extravagantly wasteful.
>
> I find the idea of lowering the thresholds to 0 sort of intriguing though.
> That makes the vacuum delay parameters the primary method to control how
> frequently vacuum runs.

I think you are mixing thresholds with scale factors.

vacuum tuples = threshold + reltuples * scale factor

If dead tuples are more than vacuum tuples, autovac does a vacuum.

So what you are proposing above amounts to setting scale factor = 0.05.
The threshold is unimportant -- in the case of a big table it matters
not if it's 0 or 1000, it will be almost irrelevant in calculations. In
the case of small tables, then the table will be vacuumed in almost
every iteration if the threshold is 0, which is fine because the table
is small anyway. So why not let the threshold be 0 and be done with it?

I, too, find a 0.2 scale factor a bit high. But since I don't run any
database, I fear I would be picking numbers out of thin air.

> In an ideal world autovacuum would be able to set the delay settings based on
> how many updates had happened since the last run started. If more than 5% of
> the table was cleaned by vacuum then decrease the delay settings to get this
> vacuum to finish sooner and allow fewer updates. If less than 5% of the table
> was cleaned by vacuum then increase the delay settings to reduce the
> unnecessary impact of vacuum. But that just leaves us back where we started.

Maybe we can construct some smarts based on something like this. The
equations we currently use are just inherited from contrib autovac,
which didn't have access to much other info. Integrated autovac can do
much better, I think.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gregory Stark 2007-07-04 22:46:32 Re: ACM Paper relevant to our buffer algorithm
Previous Message Gregory Stark 2007-07-04 21:49:06 Re: Still recommending daily vacuum...

Browse pgsql-patches by date

  From Date Subject
Next Message Magnus Hagander 2007-07-05 04:59:57 Re: script binaries renaming
Previous Message Gregory Stark 2007-07-04 21:49:06 Re: Still recommending daily vacuum...