Re: 8.4 open item: copy performance regression?

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Greg Smith" <gsmith(at)gregsmith(dot)com>, "Stefan Kaltenbrunner" <stefan(at)kaltenbrunner(dot)cc>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>, "Alan Li" <ali(at)truviso(dot)com>
Subject: Re: 8.4 open item: copy performance regression?
Date: 2009-06-26 20:13:16
Message-ID: 4A44E58C0200002500027FCF@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
>> The checkpoint_segments seems dramatic enough to be real. I wonder
>> if the test is short enough that it never got around to re-using
>> any of them, so it was doing extra writes for the initial creation
>> during the test?
>
> That's exactly what I was about to suggest. Are you starting each
> run from a fresh initdb? If so, try running the load long enough
> that the number of WAL files stabilizes (should happen at 2x
> checkpoint_segments) and then start the test measurement.

default conf (xlogs not populated)
real 3m49.604s
real 3m47.225s
real 3m45.831s

default conf (xlogs populated)
real 3m45.603s
real 3m45.284s
real 3m45.906s

default conf + checkpoint_segments = 100 (xlogs not populated)
real 4m27.629s
real 4m24.496s
real 4m22.832s

default conf + checkpoint_segments = 100 (xlogs populated)
real 3m52.746s
real 3m52.619s
real 3m50.418s

I used ten times the number of rows, to get more meaningful results.
To get the "populated" times, I just dropped the target table and
created it again; otherwise identical runs. Clearly, pre-populating
the xlog files reduces run time, especially for a large number of xlog
files; however, I still got better performance with a smaller set of
xlog files.

Regarding the fact that even with the xlog files pre-populated, the
smaller set of xlog files is faster: I'm only guessing, but I suspect
the battery backed RAID controller is what's defeating conventional
wisdom here. By writing to the same, relatively small, set of xlog
files repeatedly, some of the actual disk writes probably evaporate in
the BBU cache. More frequent checkpoints from the smaller number of
xlog files might also have caused data to start streaming to the disk
a little sooner, minimizing write gluts later.

I've often seen similar benefits to the BBU cache which cause some of
the frequently-given advice here to have no discernible affect or be
counter-productive in our environment. (I know that some doubted that
my aggressive background writer settings didn't increase disk writes,
but I couldn't even measure a difference there in the writes from OS
cache to the controller cache, much less anything which indicated it
actually increased physical disk writes.)

By the way, the number of xlog files seemed to always go to two above
2x checkpoint_segments.

-Kevin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-06-26 20:30:34 Re: 8.4 open item: copy performance regression?
Previous Message Nedyalko Borisov 2009-06-26 17:08:58 Join optimization for inheritance tables