Skip site navigation (1) Skip section navigation (2)

Re: 8.4 open item: copy performance regression?

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alan Li <ali(at)truviso(dot)com>
Subject: Re: 8.4 open item: copy performance regression?
Date: 2009-06-20 02:03:56
Message-ID: alpine.GSO.2.01.0906192124390.18922@westnet.com (view raw or flat)
Thread:
Lists: pgsql-hackers
On Fri, 19 Jun 2009, Stefan Kaltenbrunner wrote:

> In my case both the CPU (an Intel E5530 Nehalem) and the IO subsystem 
> (8GB Fiberchannel connected NetApp with 4GB cache) are pretty fast.

The server Alan identified as "Solaris 10 8/07 s10x_u4wos_12b X86" has a 
Xeon E5320 (1.86GHz) and a single boring SAS drive in it (Sun X4150). 
The filesystem involved in that particular case is UFS, which I am 
suspicious of as being part of why the problem is so pronounced there--the 
default UFS tuning is pretty lightweight in terms of how much caching it 
does.  Not sure if Alan ran any tests against the big ZFS volume on the 
other sever, I think all the results he posted were from the UFS boot 
drive there too.

> so 4096 * 1024 / BLCKSZ seems to be the sweet spot and also results in more 
> or less the same performance that 8.3 had.

It looks like it's a touch higher on our 8/07 system, it levels out at 
8192 * (haven't checked the other one yet).  I'm seeing this, using Alan's 
original test set size (to make sure I was running exactly the same test) 
and just grabbing the low/high from a set of 3 runs:

8.3.7:  0m39.266s   0m43.269s (alan:  36.2 - 39.2)

256:    0m50.435s   0m51.944s (alan:  48.1 - 50.6)
1024:   0m47.299s   0m49.346s
4096:   0m43.725s   0m46.116s
8192:   0m40.715s   0m42.480s
16384:  0m41.318s   0m42.118s
65536:  0m41.675s   0m42.955s

I collected some iostat data here as well for some of the runs (the vmstat 
data was harder to read, this being Solaris, and didn't seem to add 
anything).  I'm seeing lines like this with the default ring buffer of 256 
*:

    tty        sd1           sd2           nfs1           cpu
  tin tout kps tps serv  kps tps serv  kps tps serv   us sy wt id
    0  322  12   1    0  41371 2754    0    0   0    0   12 11  0 78
    0  166   0   0    0  46246 3380    0    0   0    0   14 10  0 76
    0  164   0   0    0  44874 3068    1    0   0    0   13  9  0 78

Obviously sd2 is where the database and source file are at.  Basically, 
about one core (out of four) tied up with a pretty even split of 
user/system time.  Using the highest ring size I tried, 65536 *, gives 
lines that look like this:

    tty        sd1           sd2           nfs1           cpu
  tin tout kps tps serv  kps tps serv  kps tps serv   us sy wt id
    0  163   0   0    0  56696 4291    0    0   0    0   20 12  0 68
    0  166   0   0    0  58554 4542    0    0   0    0   21 12  0 67
    0  168   0   0    0  56057 4308    0    0   0    0   21 12  0 67

So it seems like increasing the ring size helps saturate the disks better, 
went from ~45MB/s to 57MB/s.  What's kind of interesting is to compare 
this against the 8.3.7 run, which is the fastest of them all, which I was 
expecting to find had the highest write rate of them all;

    tty        sd1           sd2           nfs1           cpu
  tin tout kps tps serv  kps tps serv  kps tps serv   us sy wt id
    0   83   0   0    0  47654 2121    0    0   0    0   23  8  0 69
    0  240   0   0    0  44198 2150    1    0   0    0   19  8  0 73
    0   83   0   0    0  37750 1110    1    0   0    0   21  6  0 72

That's actually doing less I/O per capita, which is why it's also got less 
waiting for I/O%, but it's completing the most work.  This makes me wonder 
if in addition to the ring buffering issue, there isn't just plain more 
writing per average completed transaction in 8.4 with this type of COPY. 
This might explain why even with the expanded ring buffer, both Stephan 
and my test runs still showed a bit of a regression against 8.3.  I'm 
guessing we have a second, smaller shooter here involved as well.

In any case, a bump of the ring multiplier to either 4096 or 8192 
eliminates the worst of the regression here, good improvement so far.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Responses

pgsql-hackers by date

Next:From: Greg SmithDate: 2009-06-20 06:53:34
Subject: Re: 8.4 open item: copy performance regression?
Previous:From: Simon RiggsDate: 2009-06-19 23:03:33
Subject: Re: 8.4 open item: copy performance regression?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group