Re: Autovacuum in the backend

From: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
To: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
Cc: Alvaro Herrera <alvherre(at)surnet(dot)cl>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum in the backend
Date: 2005-06-16 14:01:53
Message-ID: 42B18651.5000700@zeut.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Gavin Sherry wrote:

>On Thu, 16 Jun 2005, Matthew T. O'Connor wrote:
>
>
>>Would you mind expounding on why you think autovacuum isn't suitable for
>>the general public? I know it's not a silver bullet, but I think in
>>general, it will be helpful for most people.
>>
>>
>
>As I said, this is largely the fault of VACUUM. The main thing I'd like to
>see is a complete solution to the problem. I'm not picking on autovacuum.
>However, I will elaborate a little on why I think autovacuum should not
>be a feature of the backend:
>
>

Don't worry, I don't think you are picking on AV.

>1) The main argument so far is that autovacuum will ensure that users who
>do not read the maintenance section of the manual will not notice a
>deterioration of performance. This means that we anticipate autovacuum
>being on by default. This suggests that the default autovacuum
>configuration will not need tuning. I do not think that will be the case.
>
>

I disagree with this. I think the newbie protection benefits of AV are
not it's primary goal, though I do think it's an important one. The
main thing AV brings is the ability to control bloating in your database
and keep your stats up-to-date no matter what your work load. It is
possible for an Admin to setup cron scripts to run VACUUM or ANALYZE on
particularly needy tables at appropriate intervals, but I guarantee that
the cron script is going to either fire too many, or too few VACUUMS.
Also when the workload changes, or a new table is added, the Admin then
needs to update his cron scripts. This all goes away with AV and I
believe this is a much bigger goal than the newbie problem.

>2) By no fault of its own, autovacuum's level of granularity is the table
>level. For people dealing with non-trivial amounts of data (and we're not
>talking gigabytes or terabytes here), this is a serious drawback. Vacuum
>at peak times can cause very intense IO bursts -- even with the
>enhancements in 8.0. I don't think the solution to the problem is to give
>users the impression that it is solved and then vacuum their tables during
>peak periods. I cannot stress this enough.
>
>

I agree this is a major problem with VACUUM, but I also think it's a
different problem. One advantage of integrated AV is that you will be
able to set per-table thresholds, which include the ability to turn off
AV for any given table. If you are running a database with tables this
big, I think you will be able to figure out how to customize integrated
AV to your needs.

>3) autovacuum on by default means row level stats are on by default. This
>will have a non-trivial performance impact on users, IMHO. For right or
>wrong, our users take the postgresql.conf defaults pretty seriously and
>this level of stats collection could and will remain enabled in some
>non-trivial percentage of users who turn autovacuum off (consider many
>users' reluctance to change shared_buffers in previous releases). To quote
>from the README:
>
>"The overhead of the stats system has been shown to be significant under
>certain workloads. For instance, a tight loop of queries performing
>"select 1" was found to run nearly 30% slower when row-level stats were
>enabled."
>
>I'm not one for "select 1" benchmarks but this is a problem that hasn't
>even been mentioned, as far as I recall.
>
>

I mentioned this in the README because I thought I should, not because I
think it's a real problem in practice. I think a real production
database doing queries that are any more complicated than "select 1"
will probably not notice the difference.

>4) Related to this, I guess, is that a user's FSM settings might be
>completely inappropriate. The 'Just read the manual' or 'Just read the
>logs' argument doesn't cut it, because the main argument for autovacuum in
>the backend is that people do not and will not.
>
>

Agreed, it doesn't solve all problems, and I'm not arguing that the
integration of AV makes PostgreSQL newbie safe it just helps reduce the
newbie problem. Again if the default FSM settings are inappropriate
for a database then the user is probably doing something more
complicated that a "my cat minka" database and will need to learn some
tuning skills anyway.

>5) It doesn't actually shrink tables -- ie, there's no VACUUM FULL. If
>we're telling users about VACUUM less often than we are now, there's bound
>to be bloating issues (see 4).
>
>

Not totally true, regular VACUUM can shrink tables a little (I think
only if there is free space at the end of the table it can cutoff
without moving data around). But if AV is on and the settings are
reasonable, then a table shouldn't bloat much or at all. Also, I don't
think we are telling people to VACUUM less, in fact tables that need it
will usually get VACUUM'd more, we are just telling the users that if
they turn AV on, they don't have to manage all the VACUUMing.

>I guess the main point is, if something major like this ships in the
>backend it says to users that the problem has gone away. pg_autovacuum is
>a good contrib style solution: it addresses a problem users have and
>attempts to solve it the way other users might try and solve it. When you
>consider it in the backend, it looks like a workaround. I think users are
>better served by solving the real problem.
>

Which problem goes away? The problem of users forgetting to VACUUM does
go away, the problem of the VACUUM command being problematic on large
tables doesn't but that is a different question.

My basic position is that with integrated AV, there will always (or at
least for the foreseeable future) be some maintenance that users will
need to do to their databases by hand (or by cron) and that AV does this
better than cron does. When VACUUM is improved, the semantics of AV
might change, but the maintenance work will still need to be done.

Matt

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Matthew T. O'Connor 2005-06-16 14:04:00 Re: Autovacuum in the backend
Previous Message Ilja Golshtein 2005-06-16 13:50:12 Re: Hungry postmaster

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthew T. O'Connor 2005-06-16 14:04:00 Re: Autovacuum in the backend
Previous Message Matthew T. O'Connor 2005-06-16 13:47:24 Re: Autovacuum in the backend