Re: Never-idle autovacuum, and does (auto)vacuuming fill the OS cache ?

From: Vincent de Phily <vincent(dot)dephily(at)mobile-devices(dot)fr>
To: pgsql-general(at)postgresql(dot)org
Cc: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
Subject: Re: Never-idle autovacuum, and does (auto)vacuuming fill the OS cache ?
Date: 2011-10-07 10:54:04
Message-ID: 2872956.hfbm651iNO@moltowork
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thursday 06 October 2011 12:09:30 Andrew Sullivan wrote:
> On Thu, Oct 06, 2011 at 03:39:02PM +0200, Vincent de Phily wrote:
> > In the case of vacuuming however, I think there's a point to be made
> > about finishing fast when all vacuum workers are constantly busy : say
> > the vacuum daemon notices that there are 10 tables that need vacuuming
> > now. It allocates 3 workers, but while they do their intentionally-slow
> > work, the other 7 tables keep creating more vacuumable tuples, so
> > it'll be more work overall because they're "late" in their "vacuum
> > schedule". Does that make sense (I'm not sure id does) ?
>
> Yes, that's exactly the issue. You need to balance the resource
> you're depriving the "real" database transactions (i.e. the user ones)
> against the cost of waiting, which waiting will probably cost those
> user transactions in performance. The reason there's no magic
> solution is because much of this depends on your use patterns.

Ok, I'm glad my reasoning wasn't completely flawed :)

> > Anyway, my particular issue is solved for now : I realized those tables
> > were terribly bloated (often more than 99% slack), so I vacuum-fulled
> > them and now the autovacuums run very fast and the disk is 90% idle
> > again. That slack probably appeared at table initialization time
> > because the fsm was not big enough. I since raised the fsm, but I think
> > it's big enough during normal (non-init) usage anyway.
>
> This is what you want to keep an eye on, then.

Yup, watching that.

> Why do you think it
> came from "initialization time", though? VACUUM only has work to do
> when dead tuples show up (e.g. from DELETE or UPDATE), and normally
> when you first populate a table you do a COPY, which isn't going to
> create dead tuples.

Those tables are a hand-made trigger-maintained "materialized view" created
about 2 months ago. Initializing them meant doing a full seqscan of the
reference table and doing one insert and 1-2 updates for each row in the MV
table. And the work was split in thousands of transactions with a load-
dependent sleep between them, in order to not impact user queries. Those
updates (and some inserts) still hapen during normal usage, but at a much
slower pace which autovacuum should have no trouble keeping up with.

> > I'm still interested in more opinions about my two questions :
> > * When does it make sense to make autovacuum more aggressive on IO,
> > and by>
> > how much ?
>
> At bottom, you don't want your tables to get so bloated that they
> exhibit the problem you just saw, but you also don't want vacuum to be
> taking so much I/O that your other tasks can't get done. That's the
> general principle; how it applies to your case depends rather on use
> patters. For instance, if you know that there will be at most 10%
> churn on every table every day, but all transactions happen between
> 9:00 and 17:00 local time, then it's probably safe to allow that to
> happen: as long as your FSM can keep track, it can all be recovered
> every day after 17:00, so you might as well allow the work to build
> up, & let the vacuums happen when they're not stealing any I/O from
> user queries. If, on the other hand, you get 100% churn on 50% of the
> tables every day between 09:00 and 11:00, and the rest of the day is
> mostly read-only traffic, with read-only traffic during all 24 hours
> (don't scoff -- I had exactly this problem once) then you want to be
> quite aggressive with the autovacuum settings, because keeping that
> 100% bloat down is going to pay off in a big way on the read-only
> traffic.

Interesting. Although if you have such well-defined churn times, it might be
better to vacuum from cron instead of from autovacuum ? You also don't want to
autovacuum now if you know your churn will be over in 15 min. Looks like it's
going to be hard to extract general rules.

One of my motivations to make autovaccum more aggresive was that my fsm was
too small and I didn't want a PG restart to take the new value into account
yet. So "finish this vacuum faster and get on to the next one" was a way to do
that "next one" before the fsm overflowed. But I now realize it's a very bad
kludge, and I should just have my fsm sized right (or sized automatically;
have I already said that I long to upgrade ? :p)

> > * Does vacuuming fill the OS's disk cache, and is it an issue if it
> > does ?
> Well, it _affects_ the OS's disk cache. Whether it fills it is
> controlled by the cache algorithms and the amount of memory you have
> devoted to cache. Every time you touch the disk, you potentially
> alter the cache in favour of what you just saw.
>
> In the above artificial examples, the vacuums that run "after everyone
> went home" will almost certainly end up taking over the cache, because
> there's no other activity to keep other things in the disk cache. In
> the second example, though, with a lot of read-only activity all the
> time, the things that are most popular are likely to remain in a
> (modern) disk cache most of the time because they're called so often
> that the vacuumed page doesn't end up being enough traffic to cause an
> eviction (or, anyway, to evict for any significant time).

Ok, so say my churn happens only in the last 10 minutes of data and readonly
queries only look at the last 24 hours of data, if vacuuming is triggered
every 48 hours, that's 24 hours of data that will potentially get back into
the cache with no benefit (or if I'm not mistaken, with PG =< 8.3 it's much
more than 24 hours). Pity. Is there a counter-example where there is a caching
benefit to the current behaviour ? If not, that might be a low-hanging fruit
to improve postgres performance.

Thanks for your interesting replies.

--
Vincent de Phily

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Pavel Stehule 2011-10-07 10:56:16 Re: Can we use sqlcode in functions of plpgsql
Previous Message tushar nehete 2011-10-07 10:40:21 Can we use sqlcode in functions of plpgsql