Re: COPY speedup

From: Pierre Frédéric Caillaud <lists(at)peufeu(dot)com>
To: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY speedup
Date: 2009-08-13 10:01:53
Message-ID: op.uylh5fq8cke6l8@soyouz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


>> But when I see a big red button, I just press it to see what happens.
>> Ugly hacks are useful to know how fast the thing can go ; then the
>> interesting part is to reimplement it cleanly, trying to reach the
>> same performance...
>
> Right -- now that you've shown a 6x speedup increase, it is clear that
> it makes sense to attempt a reimplementation. It also means it makes
> sense to have an additional pair or two of input/output functions.

Okay.

Here are some numbers. The tables are the same as in the previous email,
and it also contains the same results as "copy patch 4", aka "API hack"
for reference.

I benchmarked these :

* p5 = no api changes, COPY TO optimized :
- Optimizations in COPY (fast buffer, much less fwrite() calls, etc)
remain.
- SendFunction API reverted to original state (actually, the API changes
are still there, but deactivated, fcinfo->context = NULL).

=> small performance gain ; of course the lower per-row overhead is more
visible on "test_one_int", because that table has 1 column.
=> the (still huge) distance between p5 and "API hack" is split between
overhead in pq_send*+stringInfo (that we will tackle below) and palloc()
overhead (that was removed by the "API hack" by passing the destination
buffer directly).

* p6 = p5 + optimization of pq_send*
- inlining strategic functions
- probably benefits many other code paths

=> small incremental performance gain

* p7 = p6 + optimization of StringInfo
- inlining strategic functions
- probably benefits many other code paths

=> small incremental performance gain (they start to add up nicely)

* p8 = p7 + optimization of palloc()
- actually this is extremely dumb :
- int4send and int2send simply palloc() 16 bytes instead of 1024......
- the initial size of the allocset is 64K instead of 8K

=> still it has interesting results...

The three patches above are quite simple (especially the inlines) and yet,
speedup is already nice.

* p9 = p8 + monstrously ugly hack
copy looks at the sendfunc, notices it's int{2,4}send , and replaces it
with int{2,4}fastsend which is called directly from C, bypassing the fmgr
(urrrgghhhhhh)
of course it only works for ints.
This gives information about fmgr overhead : fmgr is pretty damn fast.

* p10 no copy
does everything except calling the SendFuncs, it writes dummy data instead.
This gives the time used in everything except the SendFuncs : table scan,
deform_tuple, file writes, etc, which is an interesting thing to know.

RESULTS :

COPY annonces TO '/dev/null' BINARY :
Time | Speedup | Table | KRows | MTuples | Name
(s) | | MB/s | /s | /s |
------|---------|--------|--------|---------|---------------------------------------------
2.149 | 2.60 x | 151.57 | 192.40 | 7.50 | copy to patch 4
3.055 | 1.83 x | 106.64 | 135.37 | 5.28 | p8 = p7 + optimization of
palloc()
3.202 | 1.74 x | 101.74 | 129.15 | 5.04 | p7 = p6 + optimization of
StringInfo
3.754 | 1.49 x | 86.78 | 110.15 | 4.30 | p6 = p5 + optimization of
pq_send*
4.434 | 1.26 x | 73.47 | 93.26 | 3.64 | p5 no api changes, COPY TO
optimized
5.579 | --- | 58.39 | 74.12 | 2.89 | compiled from source

COPY archive_data TO '/dev/null' BINARY :
Time | Speedup | Table | KRows | MTuples | Name
(s) | | MB/s | /s | /s |
-------|---------|-------|--------|---------|---------------------------------------------
5.372 | 3.75 x | 73.96 | 492.88 | 13.80 | copy to patch 4
8.545 | 2.36 x | 46.49 | 309.83 | 8.68 | p8 = p7 + optimization of
palloc()
10.229 | 1.97 x | 38.84 | 258.82 | 7.25 | p7 = p6 + optimization of
StringInfo
12.869 | 1.57 x | 30.87 | 205.73 | 5.76 | p6 = p5 + optimization of
pq_send*
15.559 | 1.30 x | 25.54 | 170.16 | 4.76 | p5 no api changes, COPY TO
optimized
20.165 | --- | 19.70 | 131.29 | 3.68 | 8.4.0 / compiled from source

COPY test_one_int TO '/dev/null' BINARY :
Time | Speedup | Table | KRows | MTuples | Name
(s) | | MB/s | /s | /s |
------|---------|--------|---------|---------|---------------------------------------------
1.493 | 4.23 x | 205.25 | 6699.22 | 6.70 | p10 no copy
1.660 | 3.80 x | 184.51 | 6022.33 | 6.02 | p9 monstrously ugly hack
2.003 | 3.15 x | 152.94 | 4991.87 | 4.99 | copy to patch 4
2.803 | 2.25 x | 109.32 | 3568.03 | 3.57 | p8 = p7 + optimization of
palloc()
2.976 | 2.12 x | 102.94 | 3360.05 | 3.36 | p7 = p6 + optimization of
StringInfo
3.165 | 2.00 x | 96.82 | 3160.05 | 3.16 | p6 = p5 + optimization of
pq_send*
3.698 | 1.71 x | 82.86 | 2704.43 | 2.70 | p5 no api changes, COPY TO
optimized
6.318 | --- | 48.49 | 1582.85 | 1.58 | 8.4.0 / compiled from source

COPY test_many_ints TO '/dev/null' BINARY :
Time | Speedup | Table | KRows | MTuples | Name
(s) | | MB/s | /s | /s |
------|---------|--------|--------|---------|---------------------------------------------
1.007 | 8.80 x | 127.23 | 993.34 | 25.83 | p10 no copy
1.114 | 7.95 x | 114.95 | 897.52 | 23.34 | p9 monstrously ugly hack
1.706 | 5.19 x | 75.08 | 586.23 | 15.24 | copy to patch 4
3.396 | 2.61 x | 37.72 | 294.49 | 7.66 | p8 = p7 + optimization of
palloc()
4.588 | 1.93 x | 27.92 | 217.98 | 5.67 | p7 = p6 + optimization of
StringInfo
5.821 | 1.52 x | 22.00 | 171.80 | 4.47 | p6 = p5 + optimization of
pq_send*
6.890 | 1.29 x | 18.59 | 145.14 | 3.77 | p5 no api changes, COPY TO
optimized
8.861 | --- | 14.45 | 112.85 | 2.93 | 8.4.0 / compiled from source

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Itagaki Takahiro 2009-08-13 10:35:34 FDW-based dblink
Previous Message Massa, Harald Armin 2009-08-13 09:19:18 Re: Alpha 1 release notes