Re: some longer, larger pgbench tests with various performance-related patches

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <robertmhaas(at)gmail(dot)com>,<pgsql-hackers(at)postgresql(dot)org>
Subject: Re: some longer, larger pgbench tests with various performance-related patches
Date: 2012-02-04 02:16:02
Message-ID: 4F2C40820200002500044D96@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas wrote:

> A couple of things stand out at me from these graphs. First, some
> of these transactions had really long latency. Second, there are a
> remarkable number of seconds all of the test during which no
> transactions at all manage to complete, sometimes several seconds
> in a row. I'm not sure why. Third, all of the tests initially start
> of processing transactions very quickly, and get slammed down very
> hard, probably because the very high rate of transaction processing
> early on causes a checkpoint to occur around 200 s.

The amazing performance at the very start of all of these tests
suggests that there is a write-back cache (presumably battery-backed)
which is absorbing writes until the cache becomes full, at which
point actual disk writes become a bottleneck. The problems you
mention here, where no transactions complete, sounds like the usual
problem that many people have complained about on the lists, where
the controller cache becomes so overwhelmed that activity seems to
cease while the controller catches up. Greg, and to a lesser degree
myself, have written about this for years.

On the nofpw graph, I wonder whether the lower write rate just takes
that much longer to fill the controller cache. I don't think it's
out of the question that it could take 700 seconds instead of 200
seconds depending on whether full pages are being fsynced to WAL.
This effect is precisely why I think that on such machines the DW
feature may be a huge help. If one small file is being written to
and fsynced repeatedly, it stays "fresh" enough not to actually be
written to the disk (it will stay in OS or controller cache), and the
disks are freed up to write everything else, helping to keep the
controller cache from being overwhelmed. (Whether patches to date
are effective at achieving this is a separate question -- I'm
convinced the concept is sound for certain important workloads.)

> I didn't actually log when the checkpoints were occurring,

It would be good to have that information if you can get it for
future tests.

-Kevin

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2012-02-04 02:50:22 Re: Review of: explain / allow collecting row counts without timing info
Previous Message Noah Misch 2012-02-04 00:37:51 Re: Review of: explain / allow collecting row counts without timing info