Re: WAL format and API changes (9.5)

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>, Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: WAL format and API changes (9.5)
Date: 2014-11-17 19:31:41
Message-ID: 546A4D1D.4010207@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/13/2014 05:04 PM, Andres Freund wrote:
> On 2014-11-13 15:33:44 +0200, Heikki Linnakangas wrote:
>> That also seems to have more than bought back the performance
>> regression I saw earlier.
>
> I think we really need to do the performance test with a different CRC
> implementation. So far all the tests in this thread seem to be
> correlating pretty linearly to the size of the record.

Ok, I ran some tests comparing this patch, plus a quick & dirty patch to
use the Intel CRC instruction for CRC calculation. The quick & dirty is
attached. It's based on code you sent me over IM some weeks ago, and
needs -msse4.2 to compile. The point of the CRC patch is to just "hide"
the effect of simply reducing the WAL volume, to concentrate on the
possible overhead on constructing and replaying the new format.

WAL insertion performance
=========================

To measure the performance of generating WAL, I ran the
wal-update-testsuite.sh that Amit also ran earlier. The cluster was
initialized with:

shared_buffers=512MB
checkpoint_segments=30
fsync=off
autovacuum=off
full_page_writes=off

master + intel-crc.patch
------------------------

testname | wal_generated | duration

-----------------------------------------+---------------+------------------
two short fields, no change | 388523496 | 5.98275089263916
two short fields, no change | 384400536 | 5.81247496604919
two short fields, no change | 394318384 | 5.83137106895447
two short fields, one changed | 424519456 | 5.94383692741394
two short fields, one changed | 427103960 | 6.0239098072052
two short fields, one changed | 431826936 | 5.99587488174438
two short fields, both changed | 424524232 | 6.24820590019226
two short fields, both changed | 424520616 | 6.27542090415955
two short fields, both changed | 429219024 | 6.24310803413391
one short and one long field, no change | 78215672 | 1.69482898712158
one short and one long field, no change | 79275744 | 1.70809984207153
one short and one long field, no change | 76444464 | 1.6900041103363
ten tiny fields, all changed | 468263328 | 7.33222103118896
ten tiny fields, all changed | 475685120 | 7.44711399078369
ten tiny fields, all changed | 466329808 | 7.44984602928162
hundred tiny fields, all changed | 181802520 | 3.48464608192444
hundred tiny fields, all changed | 169941080 | 3.57416915893555
hundred tiny fields, all changed | 170178472 | 3.51270413398743
hundred tiny fields, half changed | 184539560 | 3.50752401351929
hundred tiny fields, half changed | 184539488 | 3.51076292991638
hundred tiny fields, half changed | 182090680 | 3.50708913803101
hundred tiny fields, half nulled | 104020856 | 3.43046092987061
hundred tiny fields, half nulled | 100607288 | 3.2797839641571
hundred tiny fields, half nulled | 97194600 | 3.26747107505798
9 short and 1 long, short changed | 99243976 | 1.94056487083435
9 short and 1 long, short changed | 95552912 | 1.94364786148071
9 short and 1 long, short changed | 100634736 | 1.94284915924072
(27 rows)

wal-format-and-api-changes-12.patch + crc-intel.patch
-----------------------------------------------------

testname | wal_generated | duration

-----------------------------------------+---------------+------------------
two short fields, no change | 344717720 | 6.01146483421326
two short fields, no change | 351999048 | 5.94603586196899
two short fields, no change | 359280088 | 5.92386198043823
two short fields, one changed | 349783736 | 5.98722791671753
two short fields, one changed | 357064632 | 5.95086097717285
two short fields, one changed | 347568640 | 5.92789196968079
two short fields, both changed | 388404496 | 5.9715678691864
two short fields, both changed | 389122792 | 5.99439883232117
two short fields, both changed | 389840936 | 5.98658204078674
one short and one long field, no change | 56844552 | 1.69335913658142
one short and one long field, no change | 67099160 | 1.66509008407593
one short and one long field, no change | 65557168 | 1.67773389816284
ten tiny fields, all changed | 387205416 | 7.0555100440979
ten tiny fields, all changed | 396668040 | 7.05817604064941
ten tiny fields, all changed | 389353592 | 6.98230600357056
hundred tiny fields, all changed | 163675416 | 3.4382529258728
hundred tiny fields, all changed | 173227296 | 3.4594509601593
hundred tiny fields, all changed | 172800520 | 3.44062900543213
hundred tiny fields, half changed | 174070216 | 3.51688480377197
hundred tiny fields, half changed | 169965336 | 3.47681999206543
hundred tiny fields, half changed | 175331128 | 3.45153713226318
hundred tiny fields, half nulled | 84784272 | 3.26745700836182
hundred tiny fields, half nulled | 83732928 | 3.22160506248474
hundred tiny fields, half nulled | 80339600 | 3.22326803207397
9 short and 1 long, short changed | 98087384 | 2.00620484352112
9 short and 1 long, short changed | 101273968 | 1.99480700492859
9 short and 1 long, short changed | 93123424 | 1.99506211280823
(27 rows)

Summary: No meaningful difference in runtime.

WAL replay performance
======================

To test WAL replay performance, I ran pgbench with WAL archiving
enabled, and timed the replay of the generated WAL. I used the attached
script, replay-perf-test.sh for that. full_page_writes were disabled,
because replaying full page images is quite different from replaying
other records. (Performance of full-page images is interesting too, but
it's not expected that these WAL format changes make much difference to
that).

(This is the same test setup I used in 54611209(dot)705(at)vmware(dot)com(dot))

I repeated the replay three times, and grepped for the "redo starts" and
"redo done" messages in the log. The results:

master + crc-intel.patch
------------------------

2014-11-17 20:40:10.175 EET LOG: redo starts at 0/2000090
2014-11-17 20:41:04.327 EET LOG: redo done at 0/7A0000C8

2014-11-17 20:42:46.319 EET LOG: redo starts at 0/2000090
2014-11-17 20:43:37.973 EET LOG: redo done at 0/7A0000C8

2014-11-17 20:44:14.889 EET LOG: redo starts at 0/2000090
2014-11-17 20:45:07.161 EET LOG: redo done at 0/7A0000C8

wal-format-and-api-changes-12.patch + crc-intel.patch
-----------------------------------------------------

2014-11-17 20:20:54.198 EET LOG: redo starts at 0/2000090
2014-11-17 20:21:49.073 EET LOG: redo done at 0/6A0000C8

2014-11-17 20:23:09.438 EET LOG: redo starts at 0/2000090
2014-11-17 20:24:04.744 EET LOG: redo done at 0/6A0000C8

2014-11-17 20:25:24.076 EET LOG: redo starts at 0/2000090
2014-11-17 20:26:18.830 EET LOG: redo done at 0/6A0000C8

In summary, there is no significant difference in replay performance.
The amount of WAL generated is much smaller with the patch.

For completeness, attached is the version of the patch that I used for
these tests. It's the same as patch version 11, but I fixed one trivial
bug in a b-tree redo function.

This concludes my performance testing, until someone wants to see some
other scenario being tested. I'm happy with the results.

- Heikki

Attachment Content-Type Size
replay-perf-test.sh application/x-shellscript 1.1 KB
crc-intel.patch text/x-diff 1.9 KB
wal-format-and-api-changes-12.patch.gz application/gzip 106.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-11-17 19:32:01 Re: group locking: incomplete patch, just for discussion
Previous Message Robert Haas 2014-11-17 19:08:39 Re: [HACKERS] Performance issue with libpq prepared queries on 9.3 and 9.4