Skip site navigation (1) Skip section navigation (2)

Re: Spread checkpoint sync

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Spread checkpoint sync
Date: 2011-01-28 05:53:24
Message-ID: 4D4259D4.5050207@2ndquadrant.com (view raw or flat)
Thread:
Lists: pgsql-hackers
Robert Haas wrote:
> During each cluster, the system probably slows way down, and then recovers when
> the queue is emptied.  So the TPS improvement isn't at all a uniform
> speedup, but simply relief from the stall that would otherwise result
> from a full queue.
>   

That does seem to be the case here.  
http://www.2ndquadrant.us/pgbench-results/index.htm now has results from 
my a long test series, at two database scales that caused many backend 
fsyncs during earlier tests.  Set #5 is the existing server code, #6 is 
with the patch applied.  There are zero backend fsync calls with the 
patch applied, which isn't surprising given how simple the schema is on 
this test case.  An average of a 14% TPS gain appears at a scale of 500 
and a 8% one at 1000; the attached CSV file summarizes the average 
figures for the archives.  The gains do appear to be from smoothing out 
the dead period that normally occur during the sync phase of the checkpoint.

For example, here are the fastest runs at scale=1000/clients=256 with 
and without the patch:

http://www.2ndquadrant.us/pgbench-results/436/index.html (tps=361)
http://www.2ndquadrant.us/pgbench-results/486/index.html (tps=380)

Here the difference in how much less of a slowdown there is around the 
checkpoint end points is really obvious, and obviously an improvement.  
You can see the same thing to a lesser extent at the other end of the 
scale; here's the fastest runs at scale=500/clients=16:

http://www.2ndquadrant.us/pgbench-results/402/index.html (tps=590)
http://www.2ndquadrant.us/pgbench-results/462/index.html (tps=643)

Where there are still very ugly maximum latency figures here in every 
case, these periods just aren't as wide with the patch in place.

I'm moving onto some brief testing some of the newer kernel behavior 
here, then returning to testing the other checkpoint spreading ideas on 
top of this compation patch, presuming something like it will end up 
being committed first.  I think it's safe to say I can throw away the 
changes to try and alter the fsync absorption code present in what I 
submitted before, as this scheme does a much better job of avoiding that 
problem than those earlier queue alteration ideas.  I'm glad Robert 
grabbed the right one from the pile of ideas I threw out for what else 
might help here.

P.S. Yes, I know I have other review work to do as well.  Starting on 
the rest of that tomorrow.

-- 
Greg Smith   2ndQuadrant US    greg(at)2ndQuadrant(dot)com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


Attachment: compact-fsync-pgbench.csv
Description: text/csv (546 bytes)

In response to

Responses

pgsql-hackers by date

Next:From: Tatsuo IshiiDate: 2011-01-28 07:44:07
Subject: Re: pg_ctl failover Re: Latches, signals, and waiting
Previous:From: Robert HaasDate: 2011-01-28 04:44:15
Subject: FPI

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group