From: | "Bossart, Nathan" <bossartn(at)amazon(dot)com> |
---|---|
To: | Stephen Frost <sfrost(at)snowman(dot)net> |
Cc: | Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Bernd Helmle <mailings(at)oopsware(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: A few new options for CHECKPOINT |
Date: | 2020-11-28 01:50:54 |
Message-ID: | A07F8329-4870-40B2-9991-5016491AA11A@amazon.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 11/27/20, 10:58 AM, "Stephen Frost" <sfrost(at)snowman(dot)net> wrote:
> If you'd like to show that I'm wrong, and it's entirely possible that I
> am, then retry the above with actual load on the system, and also
> actually look at how much outstanding WAL you end up with given the
> different scenarios which has to be replayed during crash recovery.
I did a little experiment to show the behavior I'm referring to. I
used these settings:
checkpoint_completion_target = 0.9
checkpoint_timeout = 30s
max_wal_size = 20GB
WAL segment size is 64MB
I ran the following pgbench command for a few minutes before each
test:
pgbench postgres -T 3600 -c 64 -j 64 -N
For the first test, I killed Postgres just before an automatic, non-
immediate checkpoint completed.
2020-11-28 00:31:57 UTC::@:[51770]:LOG: checkpoint complete...
2020-11-28 00:32:00 UTC::@:[51770]:LOG: checkpoint starting: time
Killed Postgres at 00:32:26 UTC, 29 seconds after latest
checkpoint completed.
2020-11-28 00:32:42 UTC::@:[77256]:LOG: redo starts at 3CF/FD6B8BD0
2020-11-28 00:32:56 UTC::@:[77256]:LOG: redo done at 3D0/C94D1D00
Recovery took 14 seconds and replayed ~3.2 GB of WAL.
postgres=> SELECT pg_wal_lsn_diff('3D0/C94D1D00', '3CF/FD6B8BD0');
pg_wal_lsn_diff
-----------------
3420557616
(1 row)
For the second test, I killed Postgres just after an automatic, non-
immediate checkpoint completed.
2020-11-28 00:41:26 UTC::@:[77475]:LOG: checkpoint complete...
Killed Postgres at 00:41:26 UTC, just after latest checkpoint
completed.
2020-11-28 00:41:42 UTC::@:[8599]:LOG: redo starts at 3D3/152EDD78
2020-11-28 00:41:49 UTC::@:[8599]:LOG: redo done at 3D3/78358A40
Recovery took 7 seconds and replayed ~1.5 GB of WAL.
postgres=> SELECT pg_wal_lsn_diff('3D3/78358A40', '3D3/152EDD78');
pg_wal_lsn_diff
-----------------
1661381832
(1 row)
Granted, I used a rather aggressive checkpoint_timeout, but I think
this demonstrates that waiting for a non-immediate checkpoint to
complete can lower the amount of WAL needed for recovery, even though
it might not lower it as much as waiting for an immediate checkpoint
would.
Nathan
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2020-11-28 02:10:40 | Re: POC: postgres_fdw insert batching |
Previous Message | Tomas Vondra | 2020-11-28 01:37:17 | Re: [PoC] Non-volatile WAL buffer |