Re: Too many autovacuum workers spawned during forced auto-vacuum

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Too many autovacuum workers spawned during forced auto-vacuum
Date: 2017-01-22 21:59:34
Message-ID: a443c6aa-efe1-105b-3f5f-7d50f7caee00@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/20/17 12:40 AM, Amit Khandekar wrote:
> My impression was that postmaster is supposed to just do a minimal
> work of starting auto-vacuum launcher if not already. And, the work of
> ensuring all the things keep going is the job of auto-vacuum launcher.

There's already a ton of logic in the launcher... ISTM it'd be nice to
not start adding additional logic to the postmaster. If we had a generic
need for rate limiting launching of things maybe it wouldn't be that
bad, but AFAIK we don't.

>> That limits us to launching the
>> autovacuum launcher at most six times a minute when autovacuum = off.
>> You could argue that defeats the point of the SendPostmasterSignal in
>> SetTransactionIdLimit, but I don't think so. If vacuuming the oldest
>> database took less than 10 seconds, then we won't vacuum the
>> next-oldest database until we hit the next 64kB transaction ID
>> boundary, but that can only cause a problem if we've got so many
>> databases that we don't get to them all before we run out of
>> transaction IDs, which is almost unthinkable. If you had a ten
>> million tiny databases that all crossed the threshold at the same
>> instant, it would take you 640 million transaction IDs to visit them
>> all. If you also had autovacuum_freeze_max_age set very close to the
>> upper limit for that variable, you could conceivably have the system
>> shut down before all of those databases were reached. But that's a
>> pretty artificial scenario. If someone has that scenario, perhaps
>> they should consider more sensible configuration choices.
> Yeah this logic makes sense ...

I'm not sure that's true in the case of a significant number of
databases and a very high XID rate, but I might be missing something. In
any case I agree it's not worth worrying about. If you've disabled
autovac you're already running with scissors.

> But I guess , from looking at the code, it seems that it was carefully
> made sure that in case of auto-vacuum off, we should clean up all
> databases as fast as possible with multiple workers cleaning up
> multiple tables in parallel.
>
> Instead of autovacuum launcher and worker together making sure that
> the cycle of iterations keep on running, I was thinking the
> auto-vacuum launcher itself should make sure it does not spawn another
> worker on the same database if it did nothing. But that seemed pretty
> invasive.

IMHO we really need some more sophistication in scheduling for both
launcher and worker. Somewhere on my TODO is allowing the worker to call
a user defined SELECT to get a prioritized list, but since the launcher
doesn't connect to a database that wouldn't work. What we could do
rather simply is honor adl_next_worker in the logic that looks for
freeze, something like the attached.

On another note, does anyone else find the database selection logic
rather difficult to trace through? The logic is kinda spread throughout
several functions. The naming of rebuild_database_list() and
get_database_list() is rather confusing too.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

Attachment Content-Type Size
autovac.patch text/plain 3.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2017-01-22 22:41:18 Re: Protect syscache from bloating with negative cache entries
Previous Message Tom Lane 2017-01-22 18:58:32 Re: PATCH: recursive json_populate_record()