| From: | Kirill Reshke <reshkekirill(at)gmail(dot)com> |
|---|---|
| To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
| Cc: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, 段坤仁(刻韧) <duankunren(dot)dkr(at)alibaba-inc(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Bug in MultiXact replay compat logic for older minor version after crash-recovery |
| Date: | 2026-03-22 14:15:38 |
| Message-ID: | CALdSSPghO2YGyy4BN_rRxdG7dBSf1-pmi2D4NARk14+HyZoHhg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi!
I can see that the back-branches commit was included into master [0].
I think this is good.
On Sun, 22 Mar 2026 at 16:10, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> On 20/03/2026 19:05, Andrey Borodin wrote:
> >> On 20 Mar 2026, at 18:14, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> >>
> >> Zeroing the page again is dangerous because the CREATE_ID records can be out of order. The page might already contain some later multixids, and zeroing will overwrite them.
> >
> > I see only cases when it's not a problem: we zeroed page, did not flush it, thus did not extend the file, crashed, tested FS, zeroed page once more, overwrote again by replaying WAL, no big deal.
> > We should never zero a page with offsets, that will not be replayed by WAL.
>
> I think we're in agreement, but I want to verify because this is
> important to get right. I was replying to this:
>
> > If we are sure buffers have no this page we can detect it via FS.
> > Otherwise... nothing bad can happen, actually. We might get false positive and zero the page once more.
>
> My point is that if we rely on SimpleLruDoesPhysicalPageExist(), and it
> ever returns false even though we had already initialized the page, you
> can lose data. It's *not* ok to zero a page again that was zeroed
> earlier already, because we might have already written some real data on it.
+1. Even if we manage to compose a "fix" that zeroes a page more than
once, this "fix" will be non-future-profing and we will corrupt the
database if anything goes even slightly wrong.
> Let's consider this wal stream, generated with old minor version:
>
> ZERO_PAGE:2048 -> CREATE_ID:2048 -> CREATE_ID:2049 -> CREATE_ID:2047
>
> 2048 is the first multixid on the page. When WAL replay gets to the
> CREATE_ID:2047 record, it will enter the backwards-compatibility
> codepath and needs to determine if the page containing the next mxid
> (2048) already exists.
>
> In this WAL sequence, the page already exist because the ZERO_PAGE
> record was replayed earlier. But if we just call
> SimpleLruDoesPhysicalPageExist(), it will return 'false' because the
> page was not flushed to disk yet. If we believe that and zero the page
> again, we will lose data (the offset for mxid 2049).
>
> The opposite cannot happen: if SimpleLruDoesPhysicalPageExist() returns
> true, then it does really exist.
>
> So indeed we can only trust SimpleLruDoesPhysicalPageExist() if we are
> sure that the page is not sitting in the buffers.
+1
> Attached is a new version. I updated the comment to explain that.
>
> I also added another safety measure: before calling
> SimpleLruDoesPhysicalPageExist(), flush all the SLRU buffers. That way,
> SimpleLruDoesPhysicalPageExist() should definitely return the correct
> answer. That shouldn't be necessary because the check with
> last_initialized_offsets_page should cover all the cases where a page
> that extended the file is sitting in the buffers, but better safe than
> sorry.
>
> - Heikki
I played with v2 and was unable to fool it into corrupting db. So v2
looks good to me.
[0] https://git.postgresql.org/cgit/postgresql.git/commit/?id=516310ed4dba89bd300242df0d56b4782f33ed4d
--
Best regards,
Kirill Reshke
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Junwang Zhao | 2026-03-22 14:16:51 | Re: Copy from JSON FORMAT. |
| Previous Message | 段坤仁 (刻韧) | 2026-03-22 13:09:05 | 回复:Bug in MultiXact replay compat logic for older minor version after crash-recovery |