Quick Links

Re: BUG #18025: Probably we need to change behaviour of the checkpoint failures in PG

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc:	hargudekishor(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: BUG #18025: Probably we need to change behaviour of the checkpoint failures in PG
Date:	2023-07-17 08:06:15
Message-ID:	ZLT2d0b/Zhhgh3v1@paquier.xyz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

On Mon, Jul 17, 2023 at 09:53:32AM +0200, Laurenz Albe wrote:
> On Mon, 2023-07-17 at 05:03 +0000, PG Bug reporting form wrote:
>> Scenario is like, there was checkpoint operation failures going on the DB
>> server since last 8 hours which means no successful checkpoint happened in
>> the DB server since last 8 hours. Then DB server went into the crash mode
>> due to the exhausted disk space and did not came up as part of crash
>> recovery.
>
> Mistake #1: you did not monitor disk space.

max_wal_size is a very critical piece to adjust. It is usually
recommended to split pg_wal/ into its own partition so as the space
allocated for WAL records is predictable across checkpoints. This is
not a perfect science as max_wal_size is a soft limit so usually one
needs an extra margin with a WAL partition. There have been some
patches floating around to make that a hard limit, as well, but I
don't think we've ever agreed on the semantics that would be
acceptable when reaching the upper limit authorized.
--
Michael

In response to

Re: BUG #18025: Probably we need to change behaviour of the checkpoint failures in PG at 2023-07-17 07:53:32 from Laurenz Albe

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Michael Paquier	2023-07-17 08:29:18	Re: pg_basebackup: errors on macOS on directories with ".DS_Store" files
Previous Message	Michael Paquier	2023-07-17 07:59:53	Re: The same 2PC data maybe recovered twice