Re: autovacuum stress-testing our system

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: autovacuum stress-testing our system
Date: 2012-09-26 22:48:34
Message-ID: 50638642.2080606@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 26.9.2012 18:29, Tom Lane wrote:
> Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
>> Excerpts from Euler Taveira's message of miĂŠ sep 26 11:53:27 -0300 2012:
>>> On 26-09-2012 09:43, Tomas Vondra wrote:
>>>> 5) splitting the single stat file into multiple pieces - e.g. per database,
>>>> written separately, so that the autovacuum workers don't need to read all
>>>> the data even for databases that don't need to be vacuumed. This might be
>>>> combined with (4).
>
>>> IMHO that's the definitive solution. It would be one file per database plus a
>>> global one. That way, the check would only read the global.stat and process
>>> those database that were modified. Also, an in-memory map could store that
>>> information to speed up the checks.
>
>> +1
>
> That would help for the case of hundreds of databases, but how much
> does it help for lots of tables in a single database?

Well, it wouldn't, but it wouldn't make it worse either. Or at least
that's how I understand it.

> I'm a bit suspicious of the idea that we should encourage people to use
> hundreds of databases per installation anyway: the duplicated system
> catalogs are going to be mighty expensive, both in disk space and in
> their cache footprint in shared buffers. There was some speculation
> at the last PGCon about how we might avoid the duplication, but I think
> we're years away from any such thing actually happening.

You don't need to encourage us to do that ;-) We know it's not perfect
and considering a good alternative - e.g. several databases (~10) with
schemas inside, replacing the current database-only approach. This way
we'd get multiple stat files (thus gaining the benefits) with less
overhead (shared catalogs).

And yes, using tens of thousands of tables (serving as "caches") for a
reporting solution is "interesting" (as in the old Chinese curse) too.

> What seems to me like it could help more is fixing things so that the
> autovac launcher needn't even launch a child process for databases that
> haven't had any updates lately. I'm not sure how to do that, but it
> probably involves getting the stats collector to produce some kind of
> summary file.

Yes, I've proposed something like this in my original mail - setting a
"dirty" flag on objects (a database in this case) whenever a table in it
gets eligible for vacuum/analyze.

Tomas

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2012-09-26 23:53:50 Re: autovacuum stress-testing our system
Previous Message Gavin Flower 2012-09-26 22:06:03 Re: system_information.triggers & truncate triggers