Run end-of-recovery checkpoint in non-wait mode or skip it entirely for faster server availability?

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Run end-of-recovery checkpoint in non-wait mode or skip it entirely for faster server availability?
Date: 2022-03-25 07:40:27
Message-ID: CALj2ACUbSQJ2T6KiuWyChSiYiwQy17mn-kOqNT5oSYito3HiaQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Currently postgres runs end-of-recovery(EOR) checkpoint in wait mode
meaning the server can take longer before it opens up for connections.
The EOR checkpoint, at times, can take a while if there was a lot of
work the server has done during crash recovery, say it replayed many
WAL records or created many snapshot or mapping files or dirtied so
many buffers and so on.

Since the server spins up checkpointer process [1] while the startup
process performs recovery, isn't it a good idea to make
end-of-recovery completely optional for the users or at least run it
in non-wait mode so that the server will be available faster. The next
checkpointer cycle will take care of performing the EOR checkpoint
work, if user chooses to skip the EOR or the checkpointer will run EOR
checkpoint in background, if user chooses to run it in the non-wait
mode (without CHECKPOINT_WAIT flag). Of course by choosing this
option, users must be aware of the fact that the extra amount of
recovery work that needs to be done if a crash happens from the point
EOR gets skipped or runs in non-wait mode until the next checkpoint.
But the advantage that users get is the faster server availability.

Thanks a lot Thomas for the internal discussion.

Thoughts?

[1]
commit 7ff23c6d277d1d90478a51f0dd81414d343f3850
Author: Thomas Munro <tmunro(at)postgresql(dot)org>
Date: Mon Aug 2 17:32:20 2021 +1200

Run checkpointer and bgwriter in crash recovery.

Regards,
Bharath Rupireddy.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message PG Bug reporting form 2022-03-25 07:52:57 BUG #17448: In Windows 10, version 1703 and later, huge_pages doesn't work.
Previous Message houzj.fnst@fujitsu.com 2022-03-25 07:20:38 RE: logical replication empty transactions