Re: POC: make mxidoff 64 bits

From: Maxim Orlov <orlovmg(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2025-10-30 16:17:00
Message-ID: CACG=ezb53xpK39y2oe8H5BAAEnrtpqAjTF_xKoFZh8z6PJY7zA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 30 Oct 2025 at 12:10, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:

>
> Oh I see, the 'base' is not necessarily the base offset of the first
> multixact on the page, it's the base offset of the first multixid that
> is written to the page. And the (short) offsets can be negative. That's
> a frighteningly clever encoding scheme. One upshot of that is that WAL
> redo might get construct the page with a different 'base'. I guess that
> works, but it scares me. Could we come up with a more deterministic scheme?
>
> Definitely! The most stable approach is the one we had before, which
used actual 64-bit offsets in the SLRU. To be honest, I'm completely
happy with it. After all, what's most important for me is to have 64-bit
xids in Postgres, and this patch is a step towards that goal.

PFA v20 returns to using actual 64-bit offsets for on-disk SLRU
segments.

Fortunately, now that I've separated reading and writing offsets into
different functions, switching from one implementation to another is
easy to do.

Here's a quick overview of the current state of the patch:
1) Access to the offset is placed to separate calls:
MXOffsetWrite/MXOffsetRead.
2) I abandoned byte juggling in pg_upgrade and moved to using logic that
replicates the work with offsets im multixact.c
3) As a result, the update issue came down to the correct implementation
of functions MXOffsetWrite/MXOffsetRead.
4) The only question that remains is the question of disk representation
of 64-bit offsets in SLRU segments.

My thoughts on point (4).

Using 32-bit offsets + some kind of packing:
Pros:
+ Reduce the total disc space used by the segments; ideally it is
almost the same as before.
Cons:
- Reduces reliability (losing a part will most likely result in losing
the entire page).
- Complicates code, especially considering that segments may be written
to the page in random order.

Using 64-bit offsets in SLRU:
Pros:
+ Easy to implement/transparent logic.
Cons:
- Increases the amount of disk space used.

In terms of speed, I'm not sure which will be faster. On the one hand,
64-bit eliminates the necessity for calculations and branching. On the
other hand, the amount of data used will increase.

I am not opposed to any of these options, as our primary goal is getting
64-bit offsets. However, I like the approach using full 64-bit offsets
in SLRU, because it is more clear and, should we say, robust. Yes, it
will increase the number of segment, however this is not heap data in
for a table. Under typical circumstances, there should not be too many
such segments.

--
Best regards,
Maxim Orlov.

Attachment Content-Type Size
v20-0005-TEST-Add-test-for-64-bit-mxoff-in-pg_upgrade.patch.txt text/plain 11.4 KB
v20-0003-Add-test-for-64-bit-mxoff-in-pg_resetwal.patch application/octet-stream 4.9 KB
v20-0001-Use-64-bit-multixact-offsets.patch application/octet-stream 29.2 KB
v20-0004-TEST-bump-catversion.patch.txt text/plain 775 bytes
v20-0002-Add-pg_upgarde-for-64-bit-multixact-offsets.patch application/octet-stream 34.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Xuneng Zhou 2025-10-30 16:21:15 Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array
Previous Message Robert Haas 2025-10-30 15:52:38 Re: apply_scanjoin_target_to_paths and partitionwise join