Re: autovacuum stress-testing our system

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: autovacuum stress-testing our system
Date: 2012-09-26 14:51:03
Message-ID: CAMkU=1ymG8tR=xEjwLOKbSQNagA7VRs-TApQTxmCzeu5pPp2ww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 26, 2012 at 5:43 AM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:

> First - our system really is not a "common" one - we do have ~1000 of
> databases of
> various size, each containing up to several thousands of tables (several
> user-defined
> tables, the rest serve as caches for a reporting application - yes, it's a
> bit weird
> design but that's life). This all leads to pgstat.stat significantly larger
> than 60 MB.
>
...
> Now, let's suppose the write takes >10 ms, which is the PGSTAT_RETRY_DELAY
> values.
> With our current pgstat.stat filesize/num of relations, this is quite
> common.
> Actually the common write time in our case is ~100 ms, even if we move the
> file
> into tmpfs. That means that almost all the calls to backend_read_statsfile
> (which
> happen in all pgstat_fetch_stat_*entry calls) result in continuous stream of
> inquiries from the autovacuum workers, writing/reading of the file.

I don't think it actually does. What you are missing is the same
thing I was missing a few weeks ago when I also looked into something
like this.

3962:

* We don't recompute min_ts after sleeping, except in the
* unlikely case that cur_ts went backwards.

That means the file must have been written within 10 ms of when we
*first* asked for it.

What is generating the endless stream you are seeing is that you have
1000 databases so if naptime is one minute you are vacuuming 16 per
second. Since every database gets a new process, that process needs
to read the file as it doesn't inherit one.

...
>
> First, I'm interested in feedback - did I get all the details right, or am I
> missing something important?
>
> Next, I'm thinking about ways to solve this:
>
> 1) turning of autovacuum, doing regular VACUUM ANALYZE from cron

Increasing autovacuum_naptime seems like a far better way to do
effectively the same thing.

>
> 2) tweaking the timer values, especially increasing PGSTAT_RETRY_DELAY and
> so on
> to consider several seconds to be fresh enough - Would be nice to have
> this
> as a GUC variables, although we can do another private patch on our own.
> But
> more knobs is not always better.

I think forking it off to to another value would be better. If you
are an autovacuum worker which is just starting up and so getting its
initial stats, you can tolerate a stats file up to "autovacuum_naptime
/ 5.0" stale. If you are already started up and are just about to
vacuum a table, then keep the staleness at PGSTAT_RETRY_DELAY as it
currently is, so as not to redundantly vacuum a table.

>
> 3) logic detecting the proper PGSTAT_RETRY_DELAY value - based mostly on the
> time
> it takes to write the file (e.g. 10x the write time or something).

This is already in place.

> 5) splitting the single stat file into multiple pieces - e.g. per database,
> written separately, so that the autovacuum workers don't need to read all
> the data even for databases that don't need to be vacuumed. This might be
> combined with (4).

I think this needs to happen eventually.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Euler Taveira 2012-09-26 14:53:27 Re: autovacuum stress-testing our system
Previous Message Alvaro Herrera 2012-09-26 14:47:01 Re: autovacuum stress-testing our system