Re: Tracing down buildfarm "postmaster does not shut down" failures

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: Tracing down buildfarm "postmaster does not shut down" failures
Date: 2016-02-10 15:55:10
Message-ID: 26216.1455119710@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Noah Misch <noah(at)leadboat(dot)com> writes:
>>> That's reasonable. If you would like higher-fidelity data, I can run loops of
>>> "pg_ctl -w start; make installcheck; pg_ctl -t900 -w stop", and I could run
>>> that for HEAD and 9.2 simultaneously. A day of logs from that should show
>>> clearly if HEAD is systematically worse than 9.2.

>> That sounds like a fine plan, please do it.

> Log files:
> HEAD: https://drive.google.com/uc?export=download&id=0B9IURs2-_2ZMakl2TjFHUlpvc1k
> 92: https://drive.google.com/uc?export=download&id=0B9IURs2-_2ZMYVZtY3VqcjBFX1k

Thanks for doing this.

I did a bit of analysis of these logs. There is no case in the HEAD log
where 'lock files all released' comes out more than 2.221 seconds after
'shutdown checkpoint complete'. So it doesn't appear that stats writing
is ever a big problem for your machine. The checkpoints themselves,
though ... here are the slowest few checkpoints on HEAD:

regression=# select ts,write,sync,total-write-sync as other, total from chkpts order by total desc limit 10;
ts | write | sync | other | total
----------------------------+---------+---------+---------+---------
2016-02-09 03:33:14.671-08 | 287.691 | 355.528 | 12.509 | 655.728
2016-02-09 11:40:13.565-08 | 287.327 | 314.218 | 44.059 | 645.604
2016-02-09 12:36:49.753-08 | 149.652 | 481.736 | 13.094 | 644.482
2016-02-09 04:32:53.708-08 | 214.896 | 304.102 | 17.929 | 536.927
2016-02-09 16:31:34.661-08 | 254.462 | 186.246 | 51.541 | 492.249
2016-02-09 08:27:04.365-08 | 14.479 | 298.043 | 100.586 | 413.108
2016-02-09 07:37:57.999-08 | 280.960 | 10.408 | 14.287 | 305.655
2016-02-09 07:06:30.121-08 | 155.331 | 93.433 | 23.814 | 272.578
2016-02-09 17:45:34.422-08 | 149.471 | 104.185 | 18.598 | 272.254
2016-02-09 00:51:41.157-08 | 134.502 | 103.482 | 34.110 | 272.094
(10 rows)

and the same for 9.2:

regression=# select ts,write,sync,total-write-sync as other, total from chkpts92 order by total desc limit 10;
ts | write | sync | other | total
----------------------------+---------+---------+---------+---------
2016-02-08 23:33:36.16-08 | 66.904 | 397.653 | 229.020 | 693.577
2016-02-09 11:39:57.061-08 | 2.471 | 303.924 | 201.923 | 508.318
2016-02-09 12:37:43.52-08 | 17.620 | 333.983 | 141.172 | 492.775
2016-02-09 03:27:12.904-08 | 48.761 | 124.453 | 299.877 | 473.091
2016-02-09 04:32:49.668-08 | 11.328 | 292.310 | 148.088 | 451.726
2016-02-09 16:31:25.657-08 | 3.860 | 184.042 | 248.488 | 436.390
2016-02-09 08:27:04.407-08 | 12.019 | 300.510 | 102.855 | 415.384
2016-02-09 03:32:30.718-08 | 296.907 | 15.260 | 5.644 | 317.811
2016-02-09 15:31:41.691-08 | 0.385 | 307.996 | 0.155 | 308.536
2016-02-09 12:23:10.92-08 | 93.634 | 161.581 | 19.342 | 274.557
(10 rows)

It looks like you're going to have to set PGCTLTIMEOUT somewhere
north of 10 minutes to keep these animals from failing under load.

Interestingly, we seem to have managed to greatly reduce the "other"
time (which I presume is basically mdpostchkpt unlinking) since 9.2.
The worst case observed in HEAD is about 100s:

regression=# select ts,write,sync,total-write-sync as other, total from chkpts order by total-write-sync desc limit 10;
ts | write | sync | other | total
----------------------------+---------+---------+---------+---------
2016-02-09 08:27:04.365-08 | 14.479 | 298.043 | 100.586 | 413.108
2016-02-09 06:15:22.477-08 | 75.099 | 26.590 | 90.452 | 192.141
2016-02-09 16:31:34.661-08 | 254.462 | 186.246 | 51.541 | 492.249
2016-02-09 15:25:01.586-08 | 0.015 | 2.985 | 47.822 | 50.822
2016-02-09 11:40:13.565-08 | 287.327 | 314.218 | 44.059 | 645.604
2016-02-09 08:56:51.339-08 | 135.464 | 31.242 | 39.131 | 205.837
2016-02-09 20:23:52.797-08 | 1.309 | 12.155 | 36.350 | 49.814
2016-02-09 00:51:41.157-08 | 134.502 | 103.482 | 34.110 | 272.094
2016-02-09 19:28:33.154-08 | 0.575 | 20.434 | 33.803 | 54.812
2016-02-09 03:47:02.57-08 | 0.066 | 37.475 | 33.119 | 70.660
(10 rows)

but 9.2's worst cases greatly exceed that:

regression=# select ts,write,sync,total-write-sync as other, total from chkpts92 order by total-write-sync desc limit 10;
ts | write | sync | other | total
----------------------------+--------+---------+---------+---------
2016-02-09 03:27:12.904-08 | 48.761 | 124.453 | 299.877 | 473.091
2016-02-09 16:31:25.657-08 | 3.860 | 184.042 | 248.488 | 436.390
2016-02-08 23:33:36.16-08 | 66.904 | 397.653 | 229.020 | 693.577
2016-02-09 11:39:57.061-08 | 2.471 | 303.924 | 201.923 | 508.318
2016-02-09 07:37:44.48-08 | 3.317 | 1.494 | 158.159 | 162.970
2016-02-09 04:32:49.668-08 | 11.328 | 292.310 | 148.088 | 451.726
2016-02-09 12:37:43.52-08 | 17.620 | 333.983 | 141.172 | 492.775
2016-02-09 08:27:04.407-08 | 12.019 | 300.510 | 102.855 | 415.384
2016-02-09 18:17:32.193-08 | 90.370 | 25.369 | 63.626 | 179.365
2016-02-09 11:02:14.977-08 | 2.713 | 2.306 | 38.581 | 43.600
(10 rows)

Not real sure where that improvement came from. We could hardly
be unlinking fewer files overall, and the bottleneck seems to be
in the kernel, so what changed? The *average*, as opposed to worst
case, checkpoint time has definitely grown since 9.2:

regression=# select avg(write) as write,avg(sync) as sync,avg(total-write-sync) as other, avg(total) as total from chkpts;
write | sync | other | total
---------------------+--------------------+--------------------+---------------------
14.4432267389340560 | 7.4136738934056007 | 4.9619313459801265 | 26.8188319783197832
(1 row)

regression=# select avg(write) as write,avg(sync) as sync,avg(total-write-sync) as other, avg(total) as total from chkpts92;
write | sync | other | total
--------------------+--------------------+--------------------+---------------------
9.1272341075794621 | 6.2059480440097800 | 4.0349553789731051 | 19.3681375305623472
(1 row)

I don't find that troublesome, since we're probably syncing more
data in the newer branch's tests. But the improvement in worst
case behavior is puzzling.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2016-02-10 16:01:37 Re: Updated backup APIs for non-exclusive backups
Previous Message Teodor Sigaev 2016-02-10 15:51:32 Re: Mac OS: invalid byte sequence for encoding "UTF8"