Re: Postgres 9.1.4 - high stats collector IO usage

From: David Barton <dave(at)oneit(dot)com(dot)au>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres 9.1.4 - high stats collector IO usage
Date: 2012-08-13 02:17:28
Message-ID: 502863B8.4000203@oneit.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Jeff,

Thanks for the detailed reply.

On 13/08/12 05:23, Jeff Janes wrote:
> On Fri, Jul 27, 2012 at 9:33 PM, David Barton <dave(at)oneit(dot)com(dot)au> wrote:
>> Hi,
>>
>> <snip>
>> I have roughly 150 different databases, each of which is running in 1 of
>> roughly 30 tablespaces. The databases are small (the dump of most is are
>> under 100M, and all but 3 are under 1G, nothing larger than 2G).
> It isn't the size of the data that matters, but the number of objects.
> It sounds like your databases have about 150 statistics-containing
> objects each, in order to come up with a 3.5MB stats file.
>
> What do you gain by using databases rather than schema to do the segregation?
I had never imagined that there was such a profound difference between
using schemas and using databases. I imagine that I could convert from
using databases to using schemas.
>
>> Previously iotop reported the disk write speed, at ~6MB / second.
> So that corresponds to about 2 physical write-outs of the stats file
> per second. Are you using ext4? It has the peculiar (to me) property
> that when a file is renamed out of existence, it writes out all of
> that file's obsolete dirty buffers, rather than just dropping them as
> uninteresting to anyone. That generates about 10 times the physical
> IO as the ext3 file system does. And of course about infinite times
> the physical IO as a tmpfs.
It was previously on ext3 and moved to ext4. That didn't seem to make a
difference, I'm guessing that the higher IO on the new server was just
that it was capable of doing it.
>
>> FWIW, I just migrated all these databases over to this new server by
>> restoring from pg_dump I was previously experiencing this on 8.3, which was
>> why I upgraded to 9.1 and I also have another server with similar problems
>> on 9.1.
>>
>> Any help would be sincerely appreciated.
> I think the first line of defense would be using /dev/shm to hold the
> stats file. I don't see any downside to that. You are reading and
> writing that file so ferociously anyway that it is always going to be
> taking up RAM, no matter where you put it. Indeed, under ext4 you
> might use even have several copies of it all locked into RAM as they
> wait to reach the disk before being dropped.
>
> Increasing the naptime, as you have already done, will also decrease
> the physical IO, but that has the trade-off of risking bloat. (But
> since you are running 150 databases on one machine, I doubt any of
> them are active enough for the risk of bloat to be all that great).
> However using /dev/shm should eliminate the IO entirely with no
> trade-off at all.
>
> But with /dev/shm the CPU usage of repeatedly formatting, writing,
> reading, and parsing the stat file will still be considerable, while
> increasing the naptime will reduce that as well.
The CPU overhead seems pretty minimal, and a slight reduction in naptime
should be more than enough.
>
> As far as coding changes to overcome the fundamental problem:
>
> A relatively easy change would be to make any given autovacuum worker
> on start up tolerate a stats file that is out of date by up to, say,
> naptime/5. That would greatly reduce the amount of writing the stats
> collector needs to do (assuming that few tables actually need
> vacuuming during any given cycle), but wouldn't change the amount of
> reading a worker needs to do because it still needs to read the file
> each time as it doesn't inherit the stats from anyone. I don't think
> it would be a problem that a table which becomes eligible for
> vacuuming in the last 20% of a cycle would have to wait for one more
> round. Especially as this change might motivate one to reduce the
> naptime since doing so will be cheaper.
If the stats are mirrored in memory, then that makes sense. Of course,
if that's the case then couldn't we just alter the stats to flush at
maximum once per N seconds / minutes? If the stats are not mirrored in
memory, doesn't that imply that most of the databases will never flush
updates stats to disk and so the file will become stale?
>
> But it seems like maybe the stats collector could use a ground-up
> recoding. Maybe it could use a shared relation to store the stats
> within the database cluster itself, so that edits could be done in
> place per database rather than re-writing the entire cluster's stats?
> But I certainly am not volunteering to take on that task.
>
> A compromise might be to have one stats file per database. That way
> any given backend only needs to read in the database file it cares
> about, and the stat's collector only needs to write out the one
> database asked of it. This change could be mostly localized to just
> pgstat.c, I think.
That approach was what I had thought I am not a C programmer by any
stretch of the imagination, which is why I asked if there was a place to
find this kind of thing. It seems likely there are a few features that
people would be willing to put money towards.

>
> Cheers,
>
> Jeff
>
>
Regards,
David

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Klemme 2012-08-13 08:33:24 Re: Deferred constraints performance impact ?
Previous Message Jeff Davis 2012-08-12 22:42:10 Re: Deferred constraints performance impact ?