| From: | "Bossart, Nathan" <bossartn(at)amazon(dot)com> |
|---|---|
| To: | Stephen Frost <sfrost(at)snowman(dot)net> |
| Cc: | Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Bernd Helmle <mailings(at)oopsware(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: A few new options for CHECKPOINT |
| Date: | 2020-11-28 01:50:54 |
| Message-ID: | A07F8329-4870-40B2-9991-5016491AA11A@amazon.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 11/27/20, 10:58 AM, "Stephen Frost" <sfrost(at)snowman(dot)net> wrote:
> If you'd like to show that I'm wrong, and it's entirely possible that I
> am, then retry the above with actual load on the system, and also
> actually look at how much outstanding WAL you end up with given the
> different scenarios which has to be replayed during crash recovery.
I did a little experiment to show the behavior I'm referring to. I
used these settings:
checkpoint_completion_target = 0.9
checkpoint_timeout = 30s
max_wal_size = 20GB
WAL segment size is 64MB
I ran the following pgbench command for a few minutes before each
test:
pgbench postgres -T 3600 -c 64 -j 64 -N
For the first test, I killed Postgres just before an automatic, non-
immediate checkpoint completed.
2020-11-28 00:31:57 UTC::@:[51770]:LOG: checkpoint complete...
2020-11-28 00:32:00 UTC::@:[51770]:LOG: checkpoint starting: time
Killed Postgres at 00:32:26 UTC, 29 seconds after latest
checkpoint completed.
2020-11-28 00:32:42 UTC::@:[77256]:LOG: redo starts at 3CF/FD6B8BD0
2020-11-28 00:32:56 UTC::@:[77256]:LOG: redo done at 3D0/C94D1D00
Recovery took 14 seconds and replayed ~3.2 GB of WAL.
postgres=> SELECT pg_wal_lsn_diff('3D0/C94D1D00', '3CF/FD6B8BD0');
pg_wal_lsn_diff
-----------------
3420557616
(1 row)
For the second test, I killed Postgres just after an automatic, non-
immediate checkpoint completed.
2020-11-28 00:41:26 UTC::@:[77475]:LOG: checkpoint complete...
Killed Postgres at 00:41:26 UTC, just after latest checkpoint
completed.
2020-11-28 00:41:42 UTC::@:[8599]:LOG: redo starts at 3D3/152EDD78
2020-11-28 00:41:49 UTC::@:[8599]:LOG: redo done at 3D3/78358A40
Recovery took 7 seconds and replayed ~1.5 GB of WAL.
postgres=> SELECT pg_wal_lsn_diff('3D3/78358A40', '3D3/152EDD78');
pg_wal_lsn_diff
-----------------
1661381832
(1 row)
Granted, I used a rather aggressive checkpoint_timeout, but I think
this demonstrates that waiting for a non-immediate checkpoint to
complete can lower the amount of WAL needed for recovery, even though
it might not lower it as much as waiting for an immediate checkpoint
would.
Nathan
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tomas Vondra | 2020-11-28 02:10:40 | Re: POC: postgres_fdw insert batching |
| Previous Message | Tomas Vondra | 2020-11-28 01:37:17 | Re: [PoC] Non-volatile WAL buffer |