| From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
|---|---|
| To: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Maxim Orlov <orlovmg(at)gmail(dot)com> |
| Cc: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: POC: make mxidoff 64 bits |
| Date: | 2025-12-09 12:00:05 |
| Message-ID: | 3624730d-6dae-42bf-9458-76c4c965fb27@iki.fi |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 08/12/2025 17:43, Ashutosh Bapat wrote:
> On Sat, Dec 6, 2025 at 5:06 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> On 05/12/2025 15:42, Ashutosh Bapat wrote:
>>> And it makes the test faster by about a second or two on my laptop.
>>> Something on those lines or other is required to reduce the output
>>> from query_safe().
>>
>> Nice! That log bloat was the reason I bundled together the "COMMIT;
>> BEGIN; SELECT ...;" steps into one statement in the loop. Your solution
>> addresses it more directly.
>
> Now we can call query_safe() separately on each of those. That will be
> more readable and marginally less code.
Done.
>> I'm not entirely happy with the "Old" prefix here, because as you
>> pointed out, we might end up needing "older" or "oldold" in the future.
>> I couldn't come up with anything better though. "PreV19MultiXactOffset"
>> is quite a mouthful.
>
> How about MultiXactOffset32?
Ooh, I like that. It doesn't sound as nice for the other "old" prefixed
things though. So I changed OldMultiXactOffset to MultiXactOffset32, but
kept OldMultiXactReader, GetOldMultiXactIdSingleMember() et al. We can
live with that for now, and rename in the future if we'd need "oldold".
Committed with that and some other minor cleanups. Thanks everyone! This
patch has been brewing for a while :-).
There are some noncritical followups that I'd like to address, now that
we know that in v19 the pg_multixact files will be rewritten. That gives
us an opportunity to clean up some backwards-compatibility stuff. The
committed patch already cleaned up a bunch, but there's some more we
could do:
1. Currently, at multixid wraparound, MultiXactState->nextMXact goes to
0, which is invalid. All the readers must be prepared for that, and skip
over the 0. That's error-prone, we've already missed that a few times.
Let's change things so that the code that *writes*
MultiXactState->nextMXact skips over the zero already.
2. We currently don't persist 'oldestOffset' in the control file the
same way as 'oldestMultiXactId'. Instead, we look up the offset of the
oldestMultiXactId at startup, and keep that value in memory. Originally
that was because we missed the need for that and had to add the offset
wraparound protections in a minor release without changing the control
file format. But we could easily do it now.
With 64-bit offsets, it's actually less critical to persist the
oldestOffset. Previously, if we failed to look up the oldest offset
because the oldest multixid was invalid, it could lead to serious
trouble if the offsets then wrapped around and old offsets were
overwritten, but that won't happen anymore. Nevertheless, it leads to
unnecessarily aggressive vacuuming and some messages in the log.
At first I thought that the failure to look up the oldest offset should
no longer happen, because we don't need to support reading old 9.3 era
SLRUs anymore that were created before we added the offset wraparound
protection. But it's not so: it's still possible to have multixids with
invalid offsets in the 'offsets' SLRU on a crash. Such multixids won't
be referenced from anywhere in the heap, but I think they could later
become the oldest multixid, and we would fail to look up its offset.
Persisting the oldest offset doesn't fully fix that problem, because
advancing the oldest offset is done by looking up the oldest multixid's
offset anyway.
3. I think we should turn some of the assertions in
GetMultiXactIdMembers() into ereports(ERROR) calls. I included those
changes in my patch version 29 [1], as a separate patch, but I didn't
commit that yet.
4. Compressing the offsets, per discussion. It doesn't really seem worth
to me and I don't intend to work on it, but if someone wants to do it,
now would be the time, so that we don't need to have upgrade code to
deal with yet another format.
- Heikki
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Chao Li | 2025-12-09 12:31:19 | Re: Fix LOCK_TIMEOUT handling in slotsync worker |
| Previous Message | shveta malik | 2025-12-09 11:45:24 | Re: Improve pg_sync_replication_slots() to wait for primary to advance |