Skip site navigation (1) Skip section navigation (2)

Re: autovacuum stress-testing our system

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: autovacuum stress-testing our system
Date: 2012-09-26 16:14:15
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
On Wed, Sep 26, 2012 at 8:25 AM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
> Dne 26.09.2012 16:51, Jeff Janes napsal:
>> What is generating the endless stream you are seeing is that you have
>> 1000 databases so if naptime is one minute you are vacuuming 16 per
>> second.  Since every database gets a new process, that process needs
>> to read the file as it doesn't inherit one.
> Right. But that makes the 10ms timeout even more strange, because the
> worker is then using the data for very long time (even minutes).

On average that can't happen, or else your vacuuming would fall way
behind.  But I agree, there is no reason to have very fresh statistics
to start with.  naptime/5 seems like a good cutoff for me for the
start up reading.  If a table only becomes eligible for vacuuming in
the last 20% of the naptime, I see no reason that it can't wait
another round.  But that just means the statistics collector needs to
write the file less often, the workers still need to read it once per
database since each one only vacuums one database and don't inherit
the data from the launcher.

>> I think forking it off to to another value would be better.  If you
>> are an autovacuum worker which is just starting up and so getting its
>> initial stats, you can tolerate a stats file up to "autovacuum_naptime
>> / 5.0" stale.  If you are already started up and are just about to
>> vacuum a table, then keep the staleness at PGSTAT_RETRY_DELAY as it
>> currently is, so as not to redundantly vacuum a table.
> I always thought there's a "no more than one worker per database" limit,
> and that the file is always reloaded when switching to another database.
> So I'm not sure how could a worker see such a stale table info? Or are
> the workers keeping the stats across multiple databases?

If you only have one "active" database, then all the workers will be
in it.  I don't how likely it is that they will leap frog each other
and collide.  But anyway, if you 1000s of databases, then each one
will generally require zero vacuums per naptime (as you say, it is
mostly read only), so it is the reads upon start-up, not the reads per
table that needs vacuuming, which generates most of the traffic.  Once
you separate those two parameters out, playing around with the
PGSTAT_RETRY_DELAY one seems like a needless risk.

>>> 3) logic detecting the proper PGSTAT_RETRY_DELAY value - based mostly on
>>> the
>>> time
>>>    it takes to write the file (e.g. 10x the write time or something).
>> This is already in place.
> Really? Where?

I had thought that this part was effectively the same thing:

             * We don't recompute min_ts after sleeping, except in the
             * unlikely case that cur_ts went backwards.

But I think I did not understand your proposal.

> I've checked the current master, and the only thing I see in
> pgstat_write_statsfile
> is this (line 3558):
>   last_statwrite = globalStats.stats_timestamp;
> I don't think that's doing what I meant. That really doesn't scale the
> timeout
> according to write time. What happens right now is that when the stats file
> is
> written at time 0 (starts at zero, write finishes at 100 ms), and a worker
> asks
> for the file at 99 ms (i.e. 1ms before the write finishes), it will set the
> time
> of the inquiry to last_statrequest and then do this
>    if (last_statwrite < last_statrequest)
>       pgstat_write_statsfile(false);
> i.e. comparing it to the start of the write. So another write will start
> right
> after the file is written out. And over and over.

Ah.  I had wondered about this before too, and wondered if it would be
a good idea to have it go back to the beginning of the stats file, and
overwrite the timestamp with the current time (rather than the time it
started writing it), as the last action it does before the rename.  I
think that would automatically make it adaptive to the time it takes
to write out the file, in a fairly simple way.

> Moreover there's the 'rename' step making the new file invisible for the
> worker
> processes, which makes the thing a bit more complicated.

I think renames are assumed to be atomic.  Either it sees the old one,
or the new one, but never sees neither.



In response to


pgsql-hackers by date

Next:From: Dimitri FontaineDate: 2012-09-26 16:15:42
Subject: Re: [9.1] 2 bugs with extensions
Previous:From: Alvaro HerreraDate: 2012-09-26 16:04:34
Subject: Re: [v9.3] Extra Daemons (Re: elegant and effective way for running jobs inside a database)

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group