Re: autovacuum default parameters

From: Jim Nasby <decibel(at)decibel(dot)org>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: autovacuum default parameters
Date: 2007-07-24 16:50:11
Message-ID: 29937A0C-2E89-478B-9D10-6E99878E2110@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jul 24, 2007, at 1:02 AM, Gregory Stark wrote:
> "Alvaro Herrera" <alvherre(at)commandprompt(dot)com> writes:
>> We didn't, but while I agree with the idea, I think 5% is too low. I
>> don't want autovacuum to get excessively aggressive. Is 10% not
>> enough?
>
> Well let me flip it around. Would you think a default fillfactor of
> 10% would
> be helpful or overkill? I think it would nearly always be overkill
> and waste
> heap space and therefore cache hit rate and i/o bandwidth.
>
> I get my 5% intuition from the TPCC stock table which has about 20
> tuples per
> page. That means a fillfactor or vacuum at 5% both translate into
> trying to
> maintain a margin of one tuple's worth of space per page. Enough
> for an update
> to happen without migrating to a new page.
>
> That's actually a fairly wide table though. A narrower table could
> easily have
> 50-100 tuple per page which would require only 1-2% of dead space
> overhead.
>
> <idle speculation>
>
> Perhaps the two parameters should be tied together and we should
> make the
> autovacuum parameter: max(1%, min(10%, fillfactor(table))) and make
> the
> default fill factor 5%.
>
> Hm. We have the width of the table in the stats don't we? We could
> actually
> calculate the "1 tuple's worth of space" percentage automatically on a
> per-table basis. Or for that matter instead of calculating it as a
> percentage
> of the whole table, just compare the number of updates/deletes with
> the number
> of pages in the table.
>
> </speculation>
>
>> How about the analyze scale factor, should we keep the current
>> 10%? I
>> have less of a problem with reducing it further since analyze is
>> cheaper
>> than vacuum.
>
> My "try to maintain one tuple's worth of space" model doesn't
> answer this
> question at all. It depends entirely on whether the ddl is changing
> the data
> distribution.
>
> Perhaps this should be 1/max(stats_target) for the table. So the
> default would
> be 10% but if you raise the stats_target for a column to 100 it
> would go down
> to 1% or so.
>
> The idea being that if you have ten buckets then updating 1/10th of
> the rows
> stands an even chance of doubling or halving the size of your
> bucket. Except
> there's no math behind that intuition at all and I rather doubt it
> makes much
> sense.
>
> Actually I feel like there should be a factor of 2 or more in there
> as well.
> If you modify 1/10th of the rows and you have 10 buckets then we
> should be
> analyzing *before* the distribution has a chance to be modified beyond
> recognition.
>
> Perhaps I shouldn't have closed the <speculation> tag so early :)
> The problem
> if we try to calculate reasonable defaults like this is it makes it
> unclear
> how to expose any knob for the user to adjust it if they need to.

In reality, I think trying to get much below 10% on any large-ish
production systems just isn't going to work well. It's starting to
approach the point where you need to be vacuuming continuously, which
is going to put us right back into starvation territory.

Put another way, there's only so low you can get table bloat with
vacuum as it currently stands. If you want to do better, you need
things like HOT and DSM.

Regarding page splits, it might make sense to drop the fillfactor a
bit. I'm thinking that in most cases, the difference between 85% and
90% won't be noticed. For cases where it will matter (ie: insert-
only), you'd want to set fillfactor to 100% anyway.
--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Gregory Stark 2007-07-24 16:53:02 Re: strange buildfarm failure on lionfish
Previous Message peter.trautmeier 2007-07-24 16:26:14 Design: Escort info from WHERE clause to executor?