Re: IPC/MultixactCreation on the Standby server

From: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Álvaro Herrera <alvherre(at)kurilemu(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Ivan Bykov <i(dot)bykov(at)modernsys(dot)ru>, Kirill Reshke <reshkekirill(at)gmail(dot)com>
Subject: Re: IPC/MultixactCreation on the Standby server
Date: 2025-11-27 03:39:01
Message-ID: 5F9FB408-847B-457D-B01E-0F634767D881@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Heikki,

I just reviewed this patch. As offset[x+1] anyway equals offset[x]+nmembers, pre-write offset[x+1] seems a very clever solution.

I got a few questions/comments as below:

> On Nov 27, 2025, at 04:59, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> Here's a new version of this. Notable changes:
>
> - I reverted the changes to ExtendMultiXactOffset(), so that it deals with wraparound and FirstMultiXactId the same way as before. The caller never passes FirstMultiXactId, but the changed comments and the assertion were confusing, so I felt it's best to just leave it alone
>
> - bunch of comment changes & other cosmetic changes
>
> - I modified TrimMultiXact() to initialize the page corresponding to 'nextMulti', because if you just swapped the binary to the new one, and nextMulti was at a page boundary, it would not be initialized yet.
>
> If we want to backpatch this, and I think we need to because this fixes real bugs, we need to think through all the upgrade scenarios. I made the above-mentioned changes to TrimMultiXact(), but it doesn't fix all the problems.
>
> What happens if you replay the WAL generated with old binary, without this patch, with new binary? It's not good:
>
> LOG: database system was not properly shut down; automatic recovery in progress
> LOG: redo starts at 0/01766A68
> FATAL: could not access status of transaction 2048
> DETAIL: Could not read from file "pg_multixact/offsets/0000" at offset 8192: read too few bytes.
> CONTEXT: WAL redo at 0/05561030 for MultiXact/CREATE_ID: 2047 offset 4093 nmembers 2: 2830 (keysh) 2831 (keysh)
> LOG: startup process (PID 3130184) exited with exit code 1
>
> This is because the WAL, created with old version, contains records like this:
>
> lsn: 0/05561030, prev 0/05561008, desc: CREATE_ID 2047 offset 4093 nmembers 2: 2830 (keysh) 2831 (keysh)
> lsn: 0/055611A8, prev 0/05561180, desc: ZERO_OFF_PAGE 1
> lsn: 0/055611D0, prev 0/055611A8, desc: CREATE_ID 2048 offset 4095 nmembers 2: 2831 (keysh) 2832 (keysh)
>
>
> When replaying that with the new version, replay of the CREATE_ID 2047 record tries to set the next multixid's offset, but the page hasn't been initialized yet. With the new version, the ZERO_OFF_PAGE 1 record would appear before the CREATE_ID 2047 record, but we can't change the WAL that already exists.
>
> - Heikki
> <v11-0001-Avoid-multixact-edge-case-2-by-writing-the-next-.patch>

1
```
+ if (*offptr != offset)
+ {
+ /* should already be set to the correct value, or not at all */
+ Assert(*offptr == 0);
+ *offptr = offset;
+ MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
+ }
```

This is a more like a question. Since pre-write should always happen, in theory *offptr != offset should never be true, why do we still need to handle the case instead of just assert(false)?

2
```
+ next = multi + 1;
+ if (next < FirstMultiXactId)
+ next = FirstMultiXactId;
```

next < FirstMultiXactId will only be true when next wraps around to 0, maybe deserve one-line comment to explain that.

3
```
+ if (*next_offptr != offset + nmembers)
+ {
+ /* should already be set to the correct value, or not at all */
+ Assert(*next_offptr == 0);
+ *next_offptr = offset + nmembers;
+ MultiXactMemberCtl->shared->page_dirty[slotno] = true;
+ }
```

Should MultiXactMemberCtl be MultiXactOffsetCtl? As we are writing to the offset SLRU.

4
```
+# Another multixact test: loosing some multixact must not affect reading near
```

I guess “loosing” should be “losing”

5
```
+# multitransaction (to avaoid corener case 1).
```

Typo: avoid corner case

6
```
+# Verify thet recorded multi is readble, this call must not hang.
```

Typo: readable

7
```
+# One more multitransaction to effectivelt emit WAL record about next
```

Typo: effectivelt -> effectively

8
```
+ plan skip_all => 'Kill9 works unpredicatably on Windows’;
```

Typo: unpredicatably, no “a” after “c"

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2025-11-27 03:51:17 Re: [PATCH] psql: add size-based sorting options (O/o) for tables and indexes
Previous Message Tom Lane 2025-11-27 03:25:12 Re: Consistently use palloc_object() and palloc_array()