From: | Alyona Vinter <dlaaren8(at)gmail(dot)com> |
---|---|
To: | Nataliia <k(dot)natalissa(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Timeline switching with partial WAL records can break replica recovery |
Date: | 2025-09-10 09:07:35 |
Message-ID: | CAGWv16JqHWZRnWUcTTEMF=0f+zqpboU4t+eKMANeTJObecYPXA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi!
I've noticed an issue with pg_rewind caused by my patches.
Some logs for issue demonstration:
pg_rewind: Source timeline history:
pg_rewind: 1: 0/00000000 - 0/03002048
pg_rewind: 2: 0/03002048 - 0/00000000
pg_rewind: Target timeline history:
pg_rewind: 1: 0/00000000 - 0/00000000
pg_rewind: servers diverged at WAL location 0/03002048 on timeline 1
pg_rewind: error: could not find previous WAL record at 0/03002048: invalid
record length at 0/03002048: expected at least 24, got 0
When a common timeline ends with an overwritten contrecord, the divergence
point may not point to the start of a valid WAL record on the target,
causing errors and making rewind impossible.
To handle this case, I suggest looking for a checkpoint preceding the
divergence point starting from the last checkpoint on the target rather
than from the divergence point itself when the common timeline is
unfinished on the target. This ensures we always begin from a known-valid
position in WAL.
I'd appreciate any feedback!
Best Regards,
Alyona Vinter
Attachment | Content-Type | Size |
---|---|---|
v3-0001-Handle-WAL-timeline-switches-with-incomplete-records.patch | text/x-patch | 10.0 KB |
v3-0002-Removed-assertion-in-walsummarizer.patch | text/x-patch | 1.2 KB |
v3-0003-Handle-rewind-failure-when-a-timeline-ends-with-an-overwritten-contrecord.patch | text/x-patch | 5.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2025-09-10 09:10:55 | Re: pgsql: Preserve conflict-relevant data during logical replication. |
Previous Message | Amit Kapila | 2025-09-10 08:39:34 | Re: Conflict detection for update_deleted in logical replication |