| From: | Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com> |
|---|---|
| To: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
| Cc: | Andres Freund <andres(at)anarazel(dot)de> |
| Subject: | Re: Fix crash during recovery when redo segment is missing |
| Date: | 2025-12-04 06:31:24 |
| Message-ID: | CAMm1aWYfTo6ODrmC6VUj-Dswii_fy0wSC8734Zros5cLRohsdw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
The patch wasn’t applying cleanly on master, so I’ve rebased it and
also added it to the PG19‑4 CommitFest:
https://commitfest.postgresql.org/patch/6279/
Please review and share your feedback.
Best Regards,
Nitin Jadhav
Azure Database for PostgreSQL
Microsoft
Best Regards,
Nitin Jadhav
Azure Database for PostgreSQL
Microsoft
On Fri, Feb 21, 2025 at 4:29 PM Nitin Jadhav
<nitinjadhavpostgres(at)gmail(dot)com> wrote:
>
> Hi,
>
> In [1], Andres reported a bug where PostgreSQL crashes during recovery
> if the segment containing the redo pointer does not exist. I have
> attempted to address this issue and I am sharing a patch for the same.
>
> The problem was that PostgreSQL did not PANIC when the redo LSN and
> checkpoint LSN were in separate segments, and the file containing the
> redo LSN was missing, leading to a crash. Andres has provided a
> detailed analysis of the behavior across different settings and
> versions. Please refer to [1] for more information. This issue arises
> because PostgreSQL does not PANIC initially.
>
> The issue was resolved by ensuring that the REDO location exists once
> we successfully read the checkpoint record in InitWalRecovery(). This
> prevents control from reaching PerformWalRecovery() unless the WAL
> file containing the redo record exists. A new test script,
> 044_redo_segment_missing.pl, has been added to validate this. To
> populate the WAL file with a redo record different from the WAL file
> with the checkpoint record, I wait for the checkpoint start message
> and then issue a pg_switch_wal(), which should occur before the
> completion of the checkpoint. Then, I crash the server, and during the
> restart, it should log an appropriate error indicating that it could
> not find the redo location. Please let me know if there is a better
> way to reproduce this behavior. I have tested and verified this with
> the various scenarios Andres pointed out in [1]. Please note that this
> patch does not address error checking in StartupXLOG(),
> CreateCheckPoint(), etc., nor does it focus on cleaning up existing
> code.
>
> Attaching the patch. Please review and share your feedback. Thanks to
> Andres for spotting the bug and providing the detailed report [1].
>
> [1]: https://www.postgresql.org/message-id/20231023232145.cmqe73stvivsmlhs%40awork3.anarazel.de
>
> Best Regards,
> Nitin Jadhav
> Azure Database for PostgreSQL
> Microsoft
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Nitin Jadhav | 2025-12-04 06:36:30 | Re: Fix crash during recovery when redo segment is missing |
| Previous Message | Amit Kapila | 2025-12-04 06:28:47 | Re: Simplify code building the LR conflict messages |