Re: Some ideas about Vacuum

From: Markus Schiltknecht <markus(at)bluegap(dot)ch>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Gokulakannan Somasundaram <gokul007(at)gmail(dot)com>, pgsql-hackers list <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some ideas about Vacuum
Date: 2008-01-09 16:40:05
Message-ID: 4784F8E5.4020403@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Gregory Stark wrote:
> That's an interesting thought. I think your caveats are right but with some
> more work it might be possible to work it out. For example if a background
> process processed the WAL and accumulated an array of possibly-dead tuples to
> process in batch. It would wait whenever it sees an xid which isn't yet past
> globalxmin, and keep accumulating until it has enough to make it worthwhile
> doing a pass.

I don't understand why one would want to go via the WAL, that only
creates needless I/O. Better accumulate the data right away, during the
inserts, updates and deletes. Spilling the accumulated data to disk, if
absolutely required, would presumably still result in less I/O.

> I think a bigger issue with this approach is that it ties all your tables
> together. You can't process one table frequently while some other table has
> some long-lived deleted tuples.

Don't use the WAL as the source of that information and that's issue's gone.

> I'm also not sure it really buys us anything over having a second
> dead-space-map data structure. The WAL is much larger and serves other
> purposes which would limit what we can do with it.

Exactly.

>> You seem to be assuming that only few tuples have changed between vacuums, so
>> that WAL could quickly guide the VACUUM processes to the areas where cleaning
>> is necessary.
>>
>> Let's drop that assumption, because by default, autovacuum_scale_factor is 20%,
>> so a VACUUM process normally kicks in after 20% of tuples changed (disk space
>> is cheap, I/O isn't). Additionally, there's a default nap time of one minute -
>> and VACUUM is forced to take at least that much of a nap.
>
> I think this is exactly backwards. The goal should be to improve vacuum, then
> adjust the autovacuum_scale_factor as low as we can. As vacuum gets cheaper
> the scale factor can go lower and lower.

But you can't lower it endlessly, it's still a compromise, because it
also means reducing the amount of tuples being cleaned per scan, which
is against the goal of minimizing overall I/O cost of vacuuming.

> We shouldn't allow the existing
> autovacuum behaviour to control the way vacuum works.

That's a point.

> As a side point, "disk is cheap, I/O isn't" is a weird statement. The more
> disk you use the more I/O you'll have to do to work with the data.

That's only true, as long as you need *all* your data to work with it.

> I still
> maintain the default autovacuum_scale_factor is *far* to liberal. If I had my
> druthers it would be 5%. But that's mostly informed by TPCC experience, in
> real life the actual value will vary depending on the width of your records
> and the relative length of your transactions versus transaction rate. The TPCC
> experience is with ~ 400 byte records and many short transactions.

Hm.. 5% vs 20% would mean 4x as many vacuum scans, but only a 15% growth
in size (105% vs 120%), right? Granted, those 15% are also taken from
memory and caches, resulting in additional I/O... Still these numbers
are surprising me. Or am I missing something?

Regards

Markus

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chris Browne 2008-01-09 16:47:31 Re: Dynamic Partitioning using Segment Visibility Maps
Previous Message Markus Schiltknecht 2008-01-09 16:30:43 Re: Named vs Unnamed Partitions