| From: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru> |
|---|---|
| To: | Ivan Bykov <I(dot)Bykov(at)modernsys(dot)ru>, i(dot)bykov(at)ftdata(dot)ru |
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org, alvherre(at)kurilemu(dot)de, Kirill Reshke <reshkekirill(at)gmail(dot)com> |
| Subject: | Re: IPC/MultixactCreation on the Standby server |
| Date: | 2025-11-14 12:11:50 |
| Message-ID: | F1EE5EBE-D7E0-41C6-900B-2D650CC6E014@yandex-team.ru |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Ivan!
Thanks for the review.
> On 24 Oct 2025, at 19:33, Ivan Bykov <I(dot)Bykov(at)modernsys(dot)ru> wrote:
>
> I believe that we should provide more comprehensible feedback to developers
> by interrupting the deadlock state with a psql timeout. 15 seconds seems safe
> enough to distinguish between slow node operation and deadlock issue.
Makes sense, but I used $PostgreSQL::Test::Utils::timeout_default seconds.
> On 24 Oct 2025, at 22:16, Bykov Ivan <i(dot)bykov(at)ftdata(dot)ru> wrote:
>
> In GetNewMultiXactId() this code may lead to error
> ---
> ExtendMultiXactOffset(MultiXactState->nextMXact + 1);
> ---
> If MultiXactState->nextMXact = MaxMultiXactId (0xFFFFFFFF)
> we will not extend MultiXactOffset as we should
It will extend SLRU, this calculations are handled in ExtendMultiXactOffset(). Moreover, we should eliminate "multi != FirstMultiXactId" bailout condition too. I've extended comments and added Assert(). I've added test for this, but it's whacky-hacky with dd, resetwal and such. But it uncovered wrond assertion in this code:
/* This is an overflow-only branch */
Assert(next_entryno == 0 || next == FirstMultiXactId);
This "next == FirstMultiXactId" was missing.
I also included Kirill's suggestions.
GPT is also worried by initialization of page 0, but I don't fully understand it's concerns:
"current MXID’s page is never extended: GetNewMultiXactId() now calls ExtendMultiXactOffset(MultiXactState->nextMXact + 1) instead of extending the page that will hold result. This skips zeroing the page for the MXID we are about to assign, so the first allocation on a fresh cluster (or after wrap) tries to write into an uninitialized SLRU page and will fail/crash once the buffer manager attempts I/O."
I think we have now good coverage of what happens on fresh cluster and after a wraparound.
"wraparound can’t recreate page 0: ExtendMultiXactOffset() now asserts multi != FirstMultiXactId, yet after wrap the first ID on page 0 is exactly FirstMultiXactId. Because callers no longer pass that value, page 0 is never re-created and we keep reusing stale offsets."
Page 0 is actually created via different path on cluster initialization. Though everything works fine on wraparound.
Thanks!
Best regards, Andrey Borodin.
| Attachment | Content-Type | Size |
|---|---|---|
| v10-0001-Avoid-multixact-edge-case-2-by-writing-the-next-.patch | application/octet-stream | 17.4 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Banck | 2025-11-14 12:15:43 | Re: [Patch] Mention md5 is deprecated in postgresql.conf.sample |
| Previous Message | Daniel Gustafsson | 2025-11-14 12:11:15 | Re: BUG #19095: Test if function exit() is used fail when linked static |