| From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
|---|---|
| To: | 段坤仁(刻韧) <duankunren(dot)dkr(at)alibaba-inc(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, x4mmm <x4mmm(at)yandex-team(dot)ru> |
| Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: 回复:Bug in MultiXact replay compat logic for older minor version after crash-recovery |
| Date: | 2026-03-23 10:35:39 |
| Message-ID: | 02312231-7121-4182-bac3-e5e140c62a19@iki.fi |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 22/03/2026 15:09, 段坤仁(刻韧) wrote:
> On 20/03/2026 16:19, Heikki Linnakangas wrote:
>> it means that tracking the latest page we have zeroed is not merely
>> an optimization to avoid excessive SimpleLruDoesPhysicalPageExist()
>> calls, it's needed for correctness.
>
> Agreed.
>
> On 20/03/2026 18:14, Heikki Linnakangas wrote:
>> I also added another safety measure: before calling
>> SimpleLruDoesPhysicalPageExist(), flush all the SLRU buffers.
>
> This is more robust than scanning the SLRU buffers first and only
> calling SimpleLruDoesPhysicalPageExist() on a miss, which would
> rely on the SLRU eviction invariant.
>
> I walked through the scenarios I could think of. Let N be the last
> multixid on offset page P, so N+1 falls on page P+1.
>
> (a) Old-version WAL (CREATE_ID:N before ZERO_OFF_PAGE:P+1):
> last_initialized_offsets_page = P from earlier ZERO_OFF_PAGE.
> init_needed = (P == P) = true -> init P+1. Correct.
> Later ZERO_OFF_PAGE:P+1 is skipped via pre_initialized_offsets_page.
>
> (b) Crash-restart, page P+1 not on disk (the original bug):
> last_initialized_offsets_page = -1, fallback path fires.
> SimpleLruDoesPhysicalPageExist(P+1) = false -> init. Correct.
>
> (c) Crash-restart, page P+1 already on disk:
> Same fallback, SimpleLruDoesPhysicalPageExist(P+1) = true -> skip.
> last_initialized_offsets_page stays -1 until the next
> ZERO_OFF_PAGE switches back to the fast path.
>
> (d) Out-of-order CREATE_IDs (ZERO_PAGE:P+1 -> CREATE_ID:N+1 ->
> CREATE_ID:N+2 -> CREATE_ID:N):
> N+1 and N+2 don't cross a page boundary, compat logic not entered.
> CREATE_ID:N: init_needed = (P+1 == P) = false -> skip.
> Page P+1 is not re-zeroed, data from N+1/N+2 preserved.
>
> (e) Consecutive page crossings (N on page P, later M on page P+1):
> After init of P+1: last_initialized_offsets_page = P+1.
> CREATE_ID:M: init_needed = (P+1 == P+1) = true -> init P+2.
> Tracking advances monotonically across page boundaries.
>
> The logic looks correct to me in all the cases above.
Ok, committed. Thank you!
- Heikki
| From | Date | Subject | |
|---|---|---|---|
| Next Message | 2026-03-23 10:55:03 | [doc] pg_ctl: fix wrong description for -l | |
| Previous Message | Soumya S Murali | 2026-03-23 10:22:36 | Re: Fix bug with accessing to temporary tables of other sessions |