Re: checkpointer continuous flushing

From: Andres Freund <andres(at)anarazel(dot)de>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: checkpointer continuous flushing
Date: 2016-01-11 19:27:42
Message-ID: 20160111192742.GD10941@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2016-01-11 14:45:16 +0100, Andres Freund wrote:
> On 2016-01-09 16:49:56 +0100, Fabien COELHO wrote:
> > >Hm. New theory: The current flush interface does the flushing inside
> > >FlushBuffer()->smgrwrite()->mdwrite()->FileWrite()->FlushContextSchedule(). The
> > >problem with that is that at that point we (need to) hold a content lock
> > >on the buffer!
> >
> > You are worrying that FlushBuffer is holding a lock on a buffer and the
> > "sync_file_range" call occurs is issued at that moment.
> >
> > Although I agree that it is not that good, I would be surprise if that was
> > the explanation for a performance regression, because the sync_file_range
> > with the chosen parameters is an async call, it "advises" the OS to send the
> > file, but it does not wait for it to be completed.
>
> I frequently see sync_file_range blocking - it waits till it could
> submit the writes into the io queues. On a system bottlenecked on IO
> that's not always possible immediately.
>
> > Also, maybe you could answer a question I had about the performance
> > regression you observed, I could not find the post where you gave the
> > detailed information about it, so that I could try reproducing it: what are
> > the exact settings and conditions (shared_buffers, pgbench scaling, host
> > memory, ...), what is the observed regression (tps? other?), and what is the
> > responsiveness of the database under the regression (eg % of seconds with 0
> > tps for instance, or something like that).
>
> I measured it in a different number of cases, both on SSDs and spinning
> rust. I just reproduced it with:
>
> postgres-ckpt14 \
> -D /srv/temp/pgdev-dev-800/ \
> -c maintenance_work_mem=2GB \
> -c fsync=on \
> -c synchronous_commit=off \
> -c shared_buffers=2GB \
> -c wal_level=hot_standby \
> -c max_wal_senders=10 \
> -c max_wal_size=100GB \
> -c checkpoint_timeout=30s
>
> Using a fresh cluster each time (copied from a "template" to save time)
> and using
> pgbench -M prepared -c 16 -j16 -T 300 -P 1
> I get
>
> My laptop 1 EVO 840, 1 i7-4800MQ, 16GB ram:
> master:
> scaling factor: 800
> query mode: prepared
> number of clients: 16
> number of threads: 16
> duration: 300 s
> number of transactions actually processed: 1155733
> latency average: 4.151 ms
> latency stddev: 8.712 ms
> tps = 3851.242965 (including connections establishing)
> tps = 3851.725856 (excluding connections establishing)
>
> ckpt-14 (flushing by backends disabled):
> scaling factor: 800
> query mode: prepared
> number of clients: 16
> number of threads: 16
> duration: 300 s
> number of transactions actually processed: 855156
> latency average: 5.612 ms
> latency stddev: 7.896 ms
> tps = 2849.876327 (including connections establishing)
> tps = 2849.912015 (excluding connections establishing)

Hm. I think I have an entirely different theory that might explain some
of this theory. I instrumented lwlocks to check for additional blocking
and found some. Admittedly not exactly where I thought it might
be. Check out what you can observe when adding/enabling an elog in
FlushBuffer() (and the progress printing from BufferSync()):

(sorry, a bit long, but it's necessary to understand)

[2016-01-11 20:15:02 CET][14957] CONTEXT: writing block 0 of relation base/13000/16387
to_scan: 131141, scanned: 6, %processed: 0.00, %writeouts: 100.00
[2016-01-11 20:15:02 CET][14957] LOG: xlog flush request 1F/D2FD7E0; write 1F/D296000; flush 1F/D296000; insert: 1F/D33B418
[2016-01-11 20:15:02 CET][14957] CONTEXT: writing block 2 of relation base/13000/16387
to_scan: 131141, scanned: 7, %processed: 0.01, %writeouts: 100.00
[2016-01-11 20:15:02 CET][14957] LOG: xlog flush request 1F/D3B2E30; write 1F/D33C000; flush 1F/D33C000; insert: 1F/D403198
[2016-01-11 20:15:02 CET][14957] CONTEXT: writing block 3 of relation base/13000/16387
to_scan: 131141, scanned: 9, %processed: 0.01, %writeouts: 100.00
[2016-01-11 20:15:02 CET][14957] LOG: xlog flush request 1F/D469990; write 1F/D402000; flush 1F/D402000; insert: 1F/D4FDD00
[2016-01-11 20:15:02 CET][14957] CONTEXT: writing block 5 of relation base/13000/16387
to_scan: 131141, scanned: 11, %processed: 0.01, %writeouts: 100.00
[2016-01-11 20:15:02 CET][14957] LOG: xlog flush request 1F/D5663E8; write 1F/D4FC000; flush 1F/D4FC000; insert: 1F/D5D1390
[2016-01-11 20:15:02 CET][14957] CONTEXT: writing block 7 of relation base/13000/16387
to_scan: 131141, scanned: 14, %processed: 0.01, %writeouts: 100.00
[2016-01-11 20:15:02 CET][14957] LOG: xlog flush request 1F/D673700; write 1F/D5D0000; flush 1F/D5D0000; insert: 1F/D687E58
[2016-01-11 20:15:02 CET][14957] CONTEXT: writing block 10 of relation base/13000/16387
to_scan: 131141, scanned: 15, %processed: 0.01, %writeouts: 100.00
[2016-01-11 20:15:02 CET][14957] LOG: xlog flush request 1F/D76BEC8; write 1F/D686000; flush 1F/D686000; insert: 1F/D7A83A0
[2016-01-11 20:15:02 CET][14957] CONTEXT: writing block 11 of relation base/13000/16387
to_scan: 131141, scanned: 16, %processed: 0.01, %writeouts: 100.00
[2016-01-11 20:15:03 CET][14957] LOG: xlog flush request 1F/D7AE5C0; write 1F/D7A83E8; flush 1F/D7A83E8; insert: 1F/D8B9A88
[2016-01-11 20:15:03 CET][14957] CONTEXT: writing block 12 of relation base/13000/16387
to_scan: 131141, scanned: 17, %processed: 0.01, %writeouts: 100.00
[2016-01-11 20:15:03 CET][14957] LOG: xlog flush request 1F/DA08370; write 1F/D963A38; flush 1F/D963A38; insert: 1F/DA0A7D0
[2016-01-11 20:15:03 CET][14957] CONTEXT: writing block 13 of relation base/13000/16387
to_scan: 131141, scanned: 18, %processed: 0.01, %writeouts: 100.00
[2016-01-11 20:15:03 CET][14957] LOG: xlog flush request 1F/DAC09A0; write 1F/DA92250; flush 1F/DA92250; insert: 1F/DB9AAC8
[2016-01-11 20:15:03 CET][14957] CONTEXT: writing block 14 of relation base/13000/16387
to_scan: 131141, scanned: 21, %processed: 0.02, %writeouts: 100.00
[2016-01-11 20:15:03 CET][14957] LOG: xlog flush request 1F/DCEFF18; write 1F/DC2AD30; flush 1F/DC2AD30; insert: 1F/DCF25B0
[2016-01-11 20:15:03 CET][14957] CONTEXT: writing block 17 of relation base/13000/16387
to_scan: 131141, scanned: 23, %processed: 0.02, %writeouts: 100.00
[2016-01-11 20:15:03 CET][14957] LOG: xlog flush request 1F/DD0E9E0; write 1F/DCF25F8; flush 1F/DCF25F8; insert: 1F/DDD6198
[2016-01-11 20:15:03 CET][14957] CONTEXT: writing block 19 of relation base/13000/16387
to_scan: 131141, scanned: 24, %processed: 0.02, %writeouts: 100.00
[2016-01-11 20:15:03 CET][14957] LOG: xlog flush request 1F/DED6A20; write 1F/DEC0358; flush 1F/DEC0358; insert: 1F/DFB64C8
[2016-01-11 20:15:03 CET][14957] CONTEXT: writing block 20 of relation base/13000/16387
to_scan: 131141, scanned: 25, %processed: 0.02, %writeouts: 100.00
[2016-01-11 20:15:03 CET][14957] LOG: xlog flush request 1F/DFDEE90; write 1F/DFB6560; flush 1F/DFB6560; insert: 1F/E073468
[2016-01-11 20:15:03 CET][14957] CONTEXT: writing block 21 of relation base/13000/16387
to_scan: 131141, scanned: 26, %processed: 0.02, %writeouts: 100.00
[2016-01-11 20:15:03 CET][14957] LOG: xlog flush request 1F/E295638; write 1F/E10B9F8; flush 1F/E10B9F8; insert: 1F/E2B40E0
[2016-01-11 20:15:03 CET][14957] CONTEXT: writing block 22 of relation base/13000/16387
to_scan: 131141, scanned: 27, %processed: 0.02, %writeouts: 100.00
[2016-01-11 20:15:03 CET][14957] LOG: xlog flush request 1F/E381688; write 1F/E354BC0; flush 1F/E354BC0; insert: 1F/E459598
[2016-01-11 20:15:03 CET][14957] CONTEXT: writing block 23 of relation base/13000/16387
to_scan: 131141, scanned: 28, %processed: 0.02, %writeouts: 100.00
[2016-01-11 20:15:03 CET][14957] LOG: xlog flush request 1F/E56EF70; write 1F/E4C0C98; flush 1F/E4C0C98; insert: 1F/E56F200
[2016-01-11 20:15:03 CET][14957] CONTEXT: writing block 24 of relation base/13000/16387
to_scan: 131141, scanned: 29, %processed: 0.02, %writeouts: 100.00
[2016-01-11 20:15:03 CET][14957] LOG: xlog flush request 1F/E67E538; write 1F/E5DC440; flush 1F/E5DC440; insert: 1F/E6F7FF8
[2016-01-11 20:15:03 CET][14957] CONTEXT: writing block 25 of relation base/13000/16387
to_scan: 131141, scanned: 31, %processed: 0.02, %writeouts: 100.00
[2016-01-11 20:15:03 CET][14957] LOG: xlog flush request 1F/E873DD8; write 1F/E7D81F0; flush 1F/E7D81F0; insert: 1F/E8A1710
[2016-01-11 20:15:03 CET][14957] CONTEXT: writing block 27 of relation base/13000/16387
to_scan: 131141, scanned: 33, %processed: 0.03, %writeouts: 100.00
[2016-01-11 20:15:03 CET][14957] LOG: xlog flush request 1F/E9E3948; write 1F/E979610; flush 1F/E979610; insert: 1F/EA27AC0
[2016-01-11 20:15:03 CET][14957] CONTEXT: writing block 29 of relation base/13000/16387
to_scan: 131141, scanned: 35, %processed: 0.03, %writeouts: 100.00
[2016-01-11 20:15:03 CET][14957] LOG: xlog flush request 1F/EABDDC8; write 1F/EA6DFE0; flush 1F/EA6DFE0; insert: 1F/EB10728
[2016-01-11 20:15:03 CET][14957] CONTEXT: writing block 31 of relation base/13000/16387
to_scan: 131141, scanned: 37, %processed: 0.03, %writeouts: 100.00
[2016-01-11 20:15:03 CET][14957] LOG: xlog flush request 1F/EC07328; write 1F/EBAABE0; flush 1F/EBAABE0; insert: 1F/EC9B8A8
[2016-01-11 20:15:03 CET][14957] CONTEXT: writing block 33 of relation base/13000/16387
to_scan: 131141, scanned: 40, %processed: 0.03, %writeouts: 100.00
[2016-01-11 20:15:03 CET][14957] LOG: xlog flush request 1F/ED18FF8; write 1F/EC9B8A8; flush 1F/EC9B8A8; insert: 1F/ED8C2F8
[2016-01-11 20:15:03 CET][14957] CONTEXT: writing block 36 of relation base/13000/16387
to_scan: 131141, scanned: 41, %processed: 0.03, %writeouts: 100.00
[2016-01-11 20:15:03 CET][14957] LOG: xlog flush request 1F/EEED640; write 1F/EE0BAD8; flush 1F/EE0BAD8; insert: 1F/EF35EA8
[2016-01-11 20:15:03 CET][14957] CONTEXT: writing block 37 of relation base/13000/16387
to_scan: 131141, scanned: 42, %processed: 0.03, %writeouts: 100.00
[2016-01-11 20:15:03 CET][14957] LOG: xlog flush request 1F/EFF20B8; write 1F/EFAAE20; flush 1F/EFAAE20; insert: 1F/F06FAC0
[2016-01-11 20:15:03 CET][14957] CONTEXT: writing block 38 of relation base/13000/16387
to_scan: 131141, scanned: 43, %processed: 0.03, %writeouts: 100.00
[2016-01-11 20:15:03 CET][14957] LOG: xlog flush request 1F/F1430B0; write 1F/F0DEAB8; flush 1F/F0DEAB8; insert: 1F/F265020
[2016-01-11 20:15:03 CET][14957] CONTEXT: writing block 39 of relation base/13000/16387
to_scan: 131141, scanned: 45, %processed: 0.03, %writeouts: 100.00
[2016-01-11 20:15:03 CET][14957] LOG: xlog flush request 1F/F3556C0; write 1F/F268F68; flush 1F/F268F68; insert: 1F/F3682B8
[2016-01-11 20:15:03 CET][14957] CONTEXT: writing block 41 of relation base/13000/16387
to_scan: 131141, scanned: 46, %processed: 0.04, %writeouts: 100.00
[2016-01-11 20:15:03 CET][14957] LOG: xlog flush request 1F/F5005F8; write 1F/F4376F8; flush 1F/F4376F8; insert: 1F/F523838
[2016-01-11 20:15:03 CET][14957] CONTEXT: writing block 42 of relation base/13000/16387
to_scan: 131141, scanned: 47, %processed: 0.04, %writeouts: 100.00
[2016-01-11 20:15:03 CET][14957] LOG: xlog flush request 1F/F6261C0; write 1F/F5A07A0; flush 1F/F5A07A0; insert: 1F/F691288
[2016-01-11 20:15:03 CET][14957] CONTEXT: writing block 43 of relation base/13000/16387
to_scan: 131141, scanned: 48, %processed: 0.04, %writeouts: 100.00
[2016-01-11 20:15:03 CET][14957] LOG: xlog flush request 1F/F7CBCD0; write 1F/F719020; flush 1F/F719020; insert: 1F/F80DBB0
[2016-01-11 20:15:03 CET][14957] CONTEXT: writing block 44 of relation base/13000/16387
to_scan: 131141, scanned: 49, %processed: 0.04, %writeouts: 100.00
[2016-01-11 20:15:03 CET][14957] LOG: xlog flush request 1F/F9359C8; write 1F/F874CB8; flush 1F/F874CB8; insert: 1F/F95AD58
[2016-01-11 20:15:03 CET][14957] CONTEXT: writing block 45 of relation base/13000/16387
to_scan: 131141, scanned: 50, %processed: 0.04, %writeouts: 100.00
[2016-01-11 20:15:03 CET][14957] LOG: xlog flush request 1F/FA33F38; write 1F/FA03490; flush 1F/FA03490; insert: 1F/FAD4DF8
[2016-01-11 20:15:03 CET][14957] CONTEXT: writing block 46 of relation base/13000/16387
to_scan: 131141, scanned: 51, %processed: 0.04, %writeouts: 100.00
[2016-01-11 20:15:03 CET][14957] LOG: xlog flush request 1F/FBDBCD8; write 1F/FB52238; flush 1F/FB52238; insert: 1F/FC54E68
[2016-01-11 20:15:03 CET][14957] CONTEXT: writing block 47 of relation base/13000/16387
to_scan: 131141, scanned: 52, %processed: 0.04, %writeouts: 100.00
[2016-01-11 20:15:03 CET][14957] LOG: xlog flush request 1F/FD74B60; write 1F/FD10360; flush 1F/FD10360; insert: 1F/FDB6A88
[2016-01-11 20:15:03 CET][14957] CONTEXT: writing block 48 of relation base/13000/16387
to_scan: 131141, scanned: 53, %processed: 0.04, %writeouts: 100.00
[2016-01-11 20:15:03 CET][14957] LOG: xlog flush request 1F/FE4FF60; write 1F/FDB6AD0; flush 1F/FDB6AD0; insert: 1F/FE90028
[2016-01-11 20:15:03 CET][14957] CONTEXT: writing block 49 of relation base/13000/16387
to_scan: 131141, scanned: 54, %processed: 0.04, %writeouts: 100.00
[2016-01-11 20:15:03 CET][14957] LOG: xlog flush request 1F/FFD6A78; write 1F/FF223F0; flush 1F/FF223F0; insert: 1F/10022F70
[2016-01-11 20:15:03 CET][14957] CONTEXT: writing block 50 of relation base/13000/16387
to_scan: 131141, scanned: 55, %processed: 0.04, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/10144C98; write 1F/10023000; flush 1F/10023000; insert: 1F/10157730
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 51 of relation base/13000/16387
to_scan: 131141, scanned: 58, %processed: 0.04, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/102AA468; write 1F/1020C600; flush 1F/1020C600; insert: 1F/102C73F0
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 54 of relation base/13000/16387
to_scan: 131141, scanned: 60, %processed: 0.05, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/10313470; write 1F/102C7460; flush 1F/102C7460; insert: 1F/103D4F38
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 56 of relation base/13000/16387
to_scan: 131141, scanned: 61, %processed: 0.05, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/10510CE8; write 1F/104562F0; flush 1F/104562F0; insert: 1F/105171E8
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 57 of relation base/13000/16387
to_scan: 131141, scanned: 62, %processed: 0.05, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/10596B18; write 1F/105191B0; flush 1F/105191B0; insert: 1F/106076F8
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 58 of relation base/13000/16387
to_scan: 131141, scanned: 63, %processed: 0.05, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/1073FB28; write 1F/10693638; flush 1F/10693638; insert: 1F/10787D40
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 59 of relation base/13000/16387
to_scan: 131141, scanned: 64, %processed: 0.05, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/1088D058; write 1F/107F7068; flush 1F/107F7068; insert: 1F/10920EA0
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 60 of relation base/13000/16387
to_scan: 131141, scanned: 67, %processed: 0.05, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/109D9158; write 1F/109A8458; flush 1F/109A8458; insert: 1F/10A8A240
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 63 of relation base/13000/16387
to_scan: 131141, scanned: 68, %processed: 0.05, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/10BDAA38; write 1F/10B2AD48; flush 1F/10B2AD48; insert: 1F/10C16768
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 64 of relation base/13000/16387
to_scan: 131141, scanned: 69, %processed: 0.05, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/10D824D0; write 1F/10C859A0; flush 1F/10C859A0; insert: 1F/10DCC860
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 65 of relation base/13000/16387
to_scan: 131141, scanned: 70, %processed: 0.05, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/10E24CD8; write 1F/10DCC8A8; flush 1F/10DCC8A8; insert: 1F/10EA8588
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 66 of relation base/13000/16387
to_scan: 131141, scanned: 71, %processed: 0.05, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/10FD3E90; write 1F/10F57530; flush 1F/10F57530; insert: 1F/11043A58
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 67 of relation base/13000/16387
to_scan: 131141, scanned: 72, %processed: 0.05, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/111CE4A0; write 1F/11043AC8; flush 1F/11043AC8; insert: 1F/111ED470
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 68 of relation base/13000/16387
to_scan: 131141, scanned: 73, %processed: 0.06, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/11338080; write 1F/112917C8; flush 1F/112917C8; insert: 1F/1135CF80
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 69 of relation base/13000/16387
to_scan: 131141, scanned: 76, %processed: 0.06, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/11369068; write 1F/1135CF80; flush 1F/1135CF80; insert: 1F/1140BE88
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 72 of relation base/13000/16387
to_scan: 131141, scanned: 77, %processed: 0.06, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/1146A420; write 1F/1136E000; flush 1F/1136E000; insert: 1F/11483530
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 73 of relation base/13000/16387
to_scan: 131141, scanned: 78, %processed: 0.06, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/1157B800; write 1F/11483530; flush 1F/11483530; insert: 1F/11583E20
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 74 of relation base/13000/16387
to_scan: 131141, scanned: 79, %processed: 0.06, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/116368C0; write 1F/11583E20; flush 1F/11583E20; insert: 1F/116661A8
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 75 of relation base/13000/16387
to_scan: 131141, scanned: 81, %processed: 0.06, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/116FC598; write 1F/11668178; flush 1F/11668178; insert: 1F/11716758
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 0 of relation base/13000/16393
to_scan: 131141, scanned: 82, %processed: 0.06, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/117DA658; write 1F/117631F0; flush 1F/117631F0; insert: 1F/118206F0
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 1 of relation base/13000/16393
to_scan: 131141, scanned: 83, %processed: 0.06, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/11956320; write 1F/118E96B8; flush 1F/118E96B8; insert: 1F/1196F000
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 2 of relation base/13000/16393
to_scan: 131141, scanned: 84, %processed: 0.06, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/11A09B00; write 1F/1196F090; flush 1F/1196F090; insert: 1F/11A23D38
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 3 of relation base/13000/16393
to_scan: 131141, scanned: 85, %processed: 0.06, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/11B43C80; write 1F/11AB2148; flush 1F/11AB2148; insert: 1F/11B502D8
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 4 of relation base/13000/16393
to_scan: 131141, scanned: 86, %processed: 0.07, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/11BE2610; write 1F/11B503B8; flush 1F/11B503B8; insert: 1F/11BF9068
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 5 of relation base/13000/16393
to_scan: 131141, scanned: 87, %processed: 0.07, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/11CB9FD8; write 1F/11BF9168; flush 1F/11BF9168; insert: 1F/11CBE1F8
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 6 of relation base/13000/16393
to_scan: 131141, scanned: 88, %processed: 0.07, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/11D24E10; write 1F/11CBE268; flush 1F/11CBE268; insert: 1F/11D8BC18
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 7 of relation base/13000/16393
to_scan: 131141, scanned: 89, %processed: 0.07, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/11E9B070; write 1F/11DEC840; flush 1F/11DEC840; insert: 1F/11EB7EC0
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 8 of relation base/13000/16393
to_scan: 131141, scanned: 90, %processed: 0.07, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/11F5C3F0; write 1F/11F3FBD0; flush 1F/11F3FBD0; insert: 1F/11FE1A08
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 9 of relation base/13000/16393
to_scan: 131141, scanned: 91, %processed: 0.07, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/121EDC00; write 1F/1208E838; flush 1F/1208E838; insert: 1F/121F1EF8
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 10 of relation base/13000/16393
to_scan: 131141, scanned: 92, %processed: 0.07, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/122E0A70; write 1F/121F1F90; flush 1F/121F1F90; insert: 1F/122E9198
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 11 of relation base/13000/16393
to_scan: 131141, scanned: 93, %processed: 0.07, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/1243B698; write 1F/123A7EC8; flush 1F/123A7EC8; insert: 1F/1245E620
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 12 of relation base/13000/16393
to_scan: 131141, scanned: 94, %processed: 0.07, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/1258E7B0; write 1F/124BF6B8; flush 1F/124BF6B8; insert: 1F/1259F198
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 13 of relation base/13000/16393
to_scan: 131141, scanned: 95, %processed: 0.07, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/126C8E38; write 1F/12662BA0; flush 1F/12662BA0; insert: 1F/126FE690
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 14 of relation base/13000/16393
to_scan: 131141, scanned: 96, %processed: 0.07, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/127DE810; write 1F/126FE6D8; flush 1F/126FE6D8; insert: 1F/128081B0
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 15 of relation base/13000/16393
to_scan: 131141, scanned: 97, %processed: 0.07, %writeouts: 100.00
[2016-01-11 20:15:04 CET][14957] LOG: xlog flush request 1F/12980108; write 1F/128A6000; flush 1F/128A6000; insert: 1F/129A8E00
[2016-01-11 20:15:04 CET][14957] CONTEXT: writing block 16 of relation base/13000/16393
to_scan: 131141, scanned: 98, %processed: 0.07, %writeouts: 100.00
[2016-01-11 20:15:05 CET][14957] LOG: xlog flush request 1F/12A55978; write 1F/129ACDB8; flush 1F/129ACDB8; insert: 1F/12A6A408
[2016-01-11 20:15:05 CET][14957] CONTEXT: writing block 17 of relation base/13000/16393
to_scan: 131141, scanned: 99, %processed: 0.08, %writeouts: 100.00
[2016-01-11 20:15:05 CET][14957] LOG: xlog flush request 1F/12BC1148; write 1F/12B12F40; flush 1F/12B12F40; insert: 1F/12BC15F8
[2016-01-11 20:15:05 CET][14957] CONTEXT: writing block 18 of relation base/13000/16393
to_scan: 131141, scanned: 100, %processed: 0.08, %writeouts: 100.00
[2016-01-11 20:15:05 CET][14957] LOG: xlog flush request 1F/12D36E20; write 1F/12C70120; flush 1F/12C70120; insert: 1F/12D4DC08
[2016-01-11 20:15:05 CET][14957] CONTEXT: writing block 19 of relation base/13000/16393
to_scan: 131141, scanned: 9892, %processed: 7.54, %writeouts: 100.00
[2016-01-11 20:15:05 CET][14957] LOG: xlog flush request 1F/13128AF8; write 1F/12DEE670; flush 1F/12DEE670; insert: 1F/1313B7D0
[2016-01-11 20:15:05 CET][14957] CONTEXT: writing block 101960 of relation base/13000/16396
to_scan: 131141, scanned: 18221, %processed: 13.89, %writeouts: 100.00
[2016-01-11 20:15:05 CET][14957] LOG: xlog flush request 1F/13276328; write 1F/1313A000; flush 1F/1313A000; insert: 1F/134E93A8
[2016-01-11 20:15:05 CET][14957] CONTEXT: writing block 188242 of relation base/13000/16396
to_scan: 131141, scanned: 25857, %processed: 19.72, %writeouts: 100.00
[2016-01-11 20:15:06 CET][14957] LOG: xlog flush request 1F/13497370; write 1F/1346E000; flush 1F/1346E000; insert: 1F/136C00F8
[2016-01-11 20:15:06 CET][14957] CONTEXT: writing block 267003 of relation base/13000/16396
to_scan: 131141, scanned: 26859, %processed: 20.48, %writeouts: 100.00
[2016-01-11 20:15:06 CET][14957] LOG: xlog flush request 1F/136B5BB0; write 1F/135D6000; flush 1F/135D6000; insert: 1F/136C00F8
[2016-01-11 20:15:06 CET][14957] CONTEXT: writing block 277621 of relation base/13000/16396
to_scan: 131141, scanned: 27582, %processed: 21.03, %writeouts: 100.00
[2016-01-11 20:15:06 CET][14957] LOG: xlog flush request 1F/138C6C38; write 1F/1375E900; flush 1F/1375E900; insert: 1F/138D5518
[2016-01-11 20:15:06 CET][14957] CONTEXT: writing block 285176 of relation base/13000/16396
to_scan: 131141, scanned: 28943, %processed: 22.07, %writeouts: 100.00
[2016-01-11 20:15:06 CET][14957] LOG: xlog flush request 1F/13A5B768; write 1F/138C8000; flush 1F/138C8000; insert: 1F/13AB61D0
[2016-01-11 20:15:06 CET][14957] CONTEXT: writing block 300007 of relation base/13000/16396
to_scan: 131141, scanned: 36181, %processed: 27.59, %writeouts: 100.00
[2016-01-11 20:15:06 CET][14957] LOG: xlog flush request 1F/13C320C8; write 1F/13A8A000; flush 1F/13A8A000; insert: 1F/13DAAB40
[2016-01-11 20:15:06 CET][14957] CONTEXT: writing block 375983 of relation base/13000/16396
to_scan: 131141, scanned: 40044, %processed: 30.54, %writeouts: 100.00
[2016-01-11 20:15:07 CET][14957] LOG: xlog flush request 1F/13E196C8; write 1F/13CBA000; flush 1F/13CBA000; insert: 1F/13F9E6D8
[2016-01-11 20:15:07 CET][14957] CONTEXT: writing block 416439 of relation base/13000/16396
to_scan: 131141, scanned: 48250, %processed: 36.79, %writeouts: 100.00
[2016-01-11 20:15:07 CET][14957] LOG: xlog flush request 1F/143F6160; write 1F/13EE8000; flush 1F/13EE8000; insert: 1F/1461BB08

You can see that initially every buffer triggers a WAL flush. That
causes a slowdown because a) we're doing significantly more WAL flushes
in that time period, both causing slowdown of concurrent IO and
concurrent WAL insertions b) due to the many slow flushes we get behind
on the checkpoint schedule, triggering a rapid fire period of writes
afterwards.

My theory is that this happens due to the sorting: pgbench is an update
heavy workload, the first few pages are always going to be used if
there's free space as freespacemap.c essentially prefers those. Due to
the sorting all a relation's early pages are going to be in "in a row".

Indeed, the behaviour is not visible in a significant manner when using
pgbench -N, where there are far fewer updated pages.

I'm not entirely sure how we can deal with that.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2016-01-11 19:39:14 Re: Speedup twophase transactions
Previous Message Alvaro Herrera 2016-01-11 19:22:38 Re: PATCH: Extending the HyperLogLog API a bit