Re: BUG #18025: Probably we need to change behaviour of the checkpoint failures in PG

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: hargudekishor(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18025: Probably we need to change behaviour of the checkpoint failures in PG
Date: 2023-07-17 08:06:15
Message-ID: ZLT2d0b/Zhhgh3v1@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, Jul 17, 2023 at 09:53:32AM +0200, Laurenz Albe wrote:
> On Mon, 2023-07-17 at 05:03 +0000, PG Bug reporting form wrote:
>> Scenario is like, there was checkpoint operation failures going on the DB
>> server since last 8 hours which means no successful checkpoint happened in
>> the DB server since last 8 hours. Then DB server went into the crash mode
>> due to the exhausted disk space and did not came up as part of crash
>> recovery.
>
> Mistake #1: you did not monitor disk space.

max_wal_size is a very critical piece to adjust. It is usually
recommended to split pg_wal/ into its own partition so as the space
allocated for WAL records is predictable across checkpoints. This is
not a perfect science as max_wal_size is a soft limit so usually one
needs an extra margin with a WAL partition. There have been some
patches floating around to make that a hard limit, as well, but I
don't think we've ever agreed on the semantics that would be
acceptable when reaching the upper limit authorized.
--
Michael

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2023-07-17 08:29:18 Re: pg_basebackup: errors on macOS on directories with ".DS_Store" files
Previous Message Michael Paquier 2023-07-17 07:59:53 Re: The same 2PC data maybe recovered twice