Re: 8.4 open item: copy performance regression?

From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To:
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Smith <gsmith(at)gregsmith(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alan Li <ali(at)truviso(dot)com>
Subject: Re: 8.4 open item: copy performance regression?
Date: 2009-06-21 10:48:08
Message-ID: 4A3E0FE8.6000508@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Stefan Kaltenbrunner wrote:
> Simon Riggs wrote:
>> On Sun, 2009-06-21 at 10:28 +0200, Stefan Kaltenbrunner wrote:
>>> I did some limited testing on that but I was unable to measure any
>>> significant effect - especially since the difference between
>>> wal-logged and not is rather small for a non-parallel COPY (ie in the
>>> above example you get around 6m20s runtime for wal-logged and ~5m40s
>>> in the other case).
>>
>> This is a common confusion for small tests.
>>
>> Non-WAL logged case causes all buffers to be written to disk at end of
>> COPY. This is roughly the same size as the volume of WAL written. In
>> logged case we do not write data blocks, they get written at next
>> checkpoint. So the reduction in I/O is not apparent, since during the
>> period of the test the I/O is about the same in both cases and less I/O
>> in the non-WAL logged case. On longer tests the difference shows more
>> clearly because the data blocks start to migrate out of shared buffers
>> while the COPY is still running, effecting the test results.
>
> I was actually testing with and without explicit CHECKPOINTing
> before/after the load(and also with longer runs) too - the difference is
> negligible especially with only one process involved.
> I think the difference is simply not that large because we are still
> mostly CPU bound within COPY on reasonably fast IO-subsystems.

hmm to further demonstrate that I just did some testing(same config as
before and the 16MB for the buffer) by loading those 60M rows into a
20GB ramdisk instead of the SAN(with a CHECKPOINT before and after).

this results in the following "improvements":

16384:
wal bypass: 5min40s -> 5min10s (~9%)
wal logged: 6min20s -> 6min8s (~3%)

vmstat 5 output shows that the system is in fact CPU bound (ie using ~6%
which is more or less a full core on a 16 core box) and not doing
anything IO-wise.

r b swpd free buff cache si so bi bo in cs us sy
id wa st
1 0 0 19010444 118648 15415684 0 0 0 6 1068 2151
6 0 94 0 0
1 0 0 18870448 118652 15555204 0 0 0 6 1069 2142
6 0 94 0 0
1 0 0 18730568 118684 15694872 0 0 0 185 1080 2151
6 0 94 0 0
1 0 0 18591236 118692 15834516 0 0 0 19 1072 2144
6 0 94 0 0
1 0 0 18451472 118696 15973532 0 0 0 46 1073 2152
6 0 94 0 0
1 0 0 18311720 118704 16113204 0 0 0 7 1059 2136
6 0 94 0 0
1 0 0 18171968 118704 16252944 0 0 0 0 1077 2171
6 0 94 0 0
1 0 0 18032088 118712 16392300 0 0 0 54 1062 2138
6 0 94 0 0
1 0 0 17891716 118720 16532060 0 0 0 8 1078 2176
6 0 94 0 0

So I do think that IO is in fact not too significant for this kind of
testing and we still have ways to go in terms of CPU efficiency in COPY.

Stefan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message rct682 2009-06-21 10:54:57 enquery for timezone GMT
Previous Message Stefan Kaltenbrunner 2009-06-21 09:17:01 Re: 8.4 open item: copy performance regression?