Re: 8.4 open item: copy performance regression?

From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alan Li <ali(at)truviso(dot)com>
Subject: Re: 8.4 open item: copy performance regression?
Date: 2009-06-21 16:53:56
Message-ID: 4A3E65A4.5030101@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> I wonder if using the small ring showed any benefit when the COPY is not
>> WAL-logged? In that scenario block-on-WAL-flush behavior doesn't happen,
>> so the small ring might have some L2 cache benefits.
>
> I think the notion that we might get a cache win from a smaller ring
> is an illusion. We're not expecting to go back and re-read from a
> previously filled page in this scenario. In any case, all of the
> profiling results so far show that the CPU bottlenecks are elsewhere.
> Until we can squeeze an order of magnitude out of COPY's data parsing
> and/or XLogInsert, any possible cache effects will be down in the noise.

we also need to take a serious look at our locking overhead - WAL logged
COPY is already taking a significant performance hit with just a second
process running in parallel(into a seperate table).
I just did some testing using those 16MB buffer, the upthread mentioned
postgresql.conf and a 20GB tmpfs.

The following copying 3M rows(each) into a seperate table of the same
database.

processes total time(s) rows/s rows/s - per core

1 17.5 171428.57 171428.57
2 20.8 288461.54 144230.77
4 25.5 470588.24 117647.06
6 31.1 578778.14 96463.02
8 41.4 579710.14 72463.77
10 63 476190.48 47619.05
12 89 404494.38 33707.87
14 116 362068.97 25862.07
16 151 317880.79 19867.55

the higher the process count the more erratic the box behaves - it will
show a very high context switch rate (between 300000 and 400000/s) a
large amount of idle time (>60%!).

example vmstat 5 output for the 12 process test:

7 0 0 21654500 45436 12932516 0 0 0 3 1079 336941
34 7 59 0 0
6 0 0 21354044 45444 13232444 0 0 0 52 1068 341836
35 7 59 0 0
4 0 0 21053832 45452 13531472 0 0 0 23 1082 341672
35 7 59 0 0
9 0 0 20751136 45460 13833336 0 0 0 41 1063 344117
35 7 59 0 0
6 0 0 20443856 45468 14138116 0 0 0 14 1079 349398
35 7 58 0 0
8 0 0 20136592 45476 14444644 0 0 0 8 1060 351569
35 7 58 0 0
10 0 0 19836600 45484 14743320 0 0 0 144 1086 341533
35 7 58 0 0
7 0 0 19540472 45492 15039616 0 0 0 94 1067 337731
36 7 58 0 0
2 0 0 19258244 45500 15321156 0 0 0 15 1079 311394
34 6 60 0 0

Stefan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-06-21 17:00:09 Re: 8.4 open item: copy performance regression?
Previous Message Tom Lane 2009-06-21 16:38:34 Re: 8.4 open item: copy performance regression?