Re: Bug in MultiXact replay compat logic for older minor version after crash-recovery

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: "段坤仁(刻韧)" <duankunren(dot)dkr(at)alibaba-inc(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bug in MultiXact replay compat logic for older minor version after crash-recovery
Date: 2026-03-20 11:55:52
Message-ID: F513DFBB-DCFE-4B99-B01E-DDB04414857C@yandex-team.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 19 Mar 2026, at 23:11, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> :-(. This is a gift that keeps giving.

Well, maybe, we could leaving that deadlock in place for some time...

> Idea 3:
>
> I think a better fix is to accept that our tracking is a little imprecise and use SimpleLruDoesPhysicalPageExist() to check if the page exists. I suspect that's too expensive to do on every RecordNewMultiXact() call that crosses a page, but perhaps we could do it once at StartupMultiXact().
>
> Or perhaps track last-zeroed page separately from latest_page_number, and if we haven't seen any XLOG_MULTIXACT_ZERO_OFF_PAGE records yet after startup, call SimpleLruDoesPhysicalPageExist() to determine if initialization is needed. Attached patch does that.

SimpleLruDoesPhysicalPageExist() does not detect recently zeroed pages via buffers, because it goes directly to FS.
I tried this approach when implementing deadlock fix, it did not work for me.

Best regards, Andrey Borodin.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2026-03-20 11:56:02 Re: Skipping schema changes in publication
Previous Message Marco Nenciarini 2026-03-20 11:45:02 Re: BUG: Cascading standby fails to reconnect after falling back to archive recovery