Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system
Date: 2013-02-18 15:50:44
Message-ID: 20130218155043.GB4739@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tomas Vondra wrote:

> So, here's v10 of the patch (based on the v9+v9a), that implements the
> approach described above.
>
> It turned out to be much easier than I expected (basically just a
> rewrite of the pgstat_read_db_statsfile_timestamp() function.

Thanks. I'm giving this another look now. I think the new code means
we no longer need the first_write logic; just let the collector idle
until we get the first request. (If for some reason we considered that
we should really be publishing initial stats as early as possible, we
could just do a write_statsfiles(allDbs) call before entering the main
loop. But I don't see any reason to do this. If you do, please speak
up.)

Also, it seems to me that the new pgstat_db_requested() logic is
slightly bogus (in the "inefficient" sense, not the "incorrect" sense):
we should be comparing the timestamp of the request vs. what's already
on disk instead of blindly returning true if the list is nonempty. If
the request is older than the file, we don't need to write anything and
can discard the request. For example, suppose that backend A sends a
request for a DB; we write the file. If then quickly backend B also
requests stats for the same DB, with the current logic we'd go write the
file, but perhaps backend B would be fine with the file we already
wrote.

Another point is that I think there's a better way to handle nonexistant
files, instead of having to read the global file and all the DB records
to find the one we want. Just try to read the database file, and only
if that fails try to read the global file and compare the timestamp (so
there might be two timestamps for each DB, one in the global file and
one in the DB-specific file. I don't think this is a problem). The
point is avoid having to read the global file if possible.

So here's v11. I intend to commit this shortly. (I wanted to get it
out before lunch, but I introduced a silly bug that took me a bit to
fix.)

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
stats-split-v11.patch text/x-diff 39.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2013-02-18 17:12:59 Re: Call for Google Summer of Code mentors, admins
Previous Message Merlin Moncure 2013-02-18 15:42:08 Re: JSON Function Bike Shedding