Re: Timeline switching with partial WAL records can break replica recovery

From: Alena Vinter <dlaaren8(at)gmail(dot)com>
To: Artem Gavrilov <artem(dot)gavrilov(at)percona(dot)com>
Cc: Nataliia <k(dot)natalissa(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Timeline switching with partial WAL records can break replica recovery
Date: 2026-01-14 07:11:45
Message-ID: CAGWv16LtOA8BdL4s0t=7Wnn7wn=7hdzr+8H_un+-ZPBkwyn0AA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> let the new timeline start after incomplete contrecord
This approach seems like it could prevent issues with archiving. Upon
reconsideration, I agree that we shouldn’t alter the complete timeline —
instead, the missing contrecord should be detected in the next timeline.
However, there’s a problem with the ordering of the `EndOfRecovery` and
`MissingContrecord` records: `EndOfRecovery` must be written to WAL at the
very beginning of the new timeline. This conflicts with the current design,
so we should rethink the solution.
The first easy solution that comes to my mind is to simply extend the
`EndOfRecovery` record by adding an `overwritten_lsn` field. As I see it,
it won’t change the code much. What do you think?

---
Alena Vinter

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Karlsson 2026-01-14 07:13:17 Re: Remove redundant assignment in CreateWorkExprContext
Previous Message Andreas Karlsson 2026-01-14 06:43:35 Re: Parallelizing startup with many databases