Re: 8.4 open item: copy performance regression?

From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alan Li <ali(at)truviso(dot)com>
Subject: Re: 8.4 open item: copy performance regression?
Date: 2009-06-20 11:15:48
Message-ID: 4A3CC4E4.7050705@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greg Smith wrote:
> On Fri, 19 Jun 2009, Stefan Kaltenbrunner wrote:
>
>> In my case both the CPU (an Intel E5530 Nehalem) and the IO subsystem
>> (8GB Fiberchannel connected NetApp with 4GB cache) are pretty fast.
>
> The server Alan identified as "Solaris 10 8/07 s10x_u4wos_12b X86" has a
> Xeon E5320 (1.86GHz) and a single boring SAS drive in it (Sun X4150).
> The filesystem involved in that particular case is UFS, which I am
> suspicious of as being part of why the problem is so pronounced
> there--the default UFS tuning is pretty lightweight in terms of how much
> caching it does. Not sure if Alan ran any tests against the big ZFS
> volume on the other sever, I think all the results he posted were from
> the UFS boot drive there too.
>
>> so 4096 * 1024 / BLCKSZ seems to be the sweet spot and also results in
>> more or less the same performance that 8.3 had.
>
> It looks like it's a touch higher on our 8/07 system, it levels out at
> 8192 * (haven't checked the other one yet). I'm seeing this, using
> Alan's original test set size (to make sure I was running exactly the
> same test) and just grabbing the low/high from a set of 3 runs:
>
> 8.3.7: 0m39.266s 0m43.269s (alan: 36.2 - 39.2)
>
> 256: 0m50.435s 0m51.944s (alan: 48.1 - 50.6)
> 1024: 0m47.299s 0m49.346s
> 4096: 0m43.725s 0m46.116s
> 8192: 0m40.715s 0m42.480s
> 16384: 0m41.318s 0m42.118s
> 65536: 0m41.675s 0m42.955s

hmm interesting - I just did a bunch of runs using the lineitem table
from the DBT3 tests (loading 60M rows in each run) and the same config
Alan used.

8.4(postpatch - not RC1 but that one seems to behave exactly the same way)

lineitem1
256 9min38s
512 9min20s
1024 7m44.667s/7m45.342s
2048 7m15.500s/7m17.910s
4096 7m11.424s/7m13.276s
8192 6m43.203s/6m48.293s
16384 6m24.980s/6m24.116s
32768 6m20.753s/6m22.083s
65536 6m22.913s/6m22.449s
1048576 6m23.765s/6m24.645s

8.3

6m45.650s/6m44.781s

so on this workload the sweetspot seems to be much higher than on the
one with the narrower rows.

[...]
> That's actually doing less I/O per capita, which is why it's also got
> less waiting for I/O%, but it's completing the most work. This makes me
> wonder if in addition to the ring buffering issue, there isn't just
> plain more writing per average completed transaction in 8.4 with this
> type of COPY. This might explain why even with the expanded ring buffer,
> both Stephan and my test runs still showed a bit of a regression against
> 8.3. I'm guessing we have a second, smaller shooter here involved as well.

well yes I also suspect that there is some secondary effect at play here
and I believe I have seen the "more IO with 8.4" thing here too but I
have not actually paid enough attention yet to be sure.

>
> In any case, a bump of the ring multiplier to either 4096 or 8192
> eliminates the worst of the regression here, good improvement so far.

yeah with the above numbers I would say that 8192 should remove most if
not all of the regression. However it seems that we might have to make
this more dynamic in the future since the behaviour seems to depend on a
number of variables...

Stefan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2009-06-20 11:46:17 Re: 8.4 open item: copy performance regression?
Previous Message Greg Stark 2009-06-20 11:10:37 Re: 8.4 open item: copy performance regression?