Quick Links

Re: POC: make mxidoff 64 bits

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	Maxim Orlov <orlovmg(at)gmail(dot)com>
Cc:	wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: POC: make mxidoff 64 bits
Date:	2025-11-12 13:00:02
Message-ID:	669fa18c-82f9-4f56-86a9-75ba7bc4e7dc@iki.fi
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 07/11/2025 18:03, Maxim Orlov wrote:
> I tried finding out how long it would take to convert a big number of
> segments. Unfortunately, I only have access to a very old machine right
> now. It took me 7 hours to generate this much data on my old
> Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz with 16 Gb of RAM.
>
> Here are my rough measurements:
>
> HDD
> $ sudo sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
> $ time pg_upgrade
> ...
> real 4m59.459s
> user 0m19.974s
> sys 0m13.640s
>
> SSD
> $ sudo sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
> $ time pg_upgrade
> ...
> real 4m52.958s
> user 0m19.826s
> sys 0m13.624s
>
> I aim to get access to more modern stuff and check it all out there.

Thanks, I also did some perf testing on my laptop. I wrote a little
helper function to consume multixids, and used it to create a v17
cluster with 100 million multixids. See attached
consume-mxids.patch.txt. I then ran pg_upgrade on that, and measured how
long the pg_multixact conversion part of pg_upgrade took. It took about
1.2 s on my laptop. Extrapolating from that, converting 1 billion
multixids would take 12 s. These were very simple multixacts with just
one member each, though; realistic multixacts with more members would
presumably take a little longer.

In any case, I think we're in an acceptable ballpark here.

There's some very low-hanging fruit though: Profiling with 'linux-perf'
suggested that a lot of CPU time was spent simply on the function call
overhead of GetOldMultiXactIdSingleMember, SlruReadSwitchPage,
RecordNewMultiXact, SlruWriteSwitchPage for each multixact. I added an
inlined fast path to SlruReadSwitchPage and SlruWriteSwitchPage to
eliminate the function call overhead of those in the common case that no
page switch is needed. With that, the 100 million mxid test case I used
went from 1.2 s to 0.9 s. We could optimize this further but I think
this is good enough.

Some other changes since patch set v23:

- Rebased. I committed the wraparound bug fixes.

- I added an SlruFileName() helper function to slru_io.c, and support
for reading SLRUs with long_segment_names==true. It's not needed
currently, but it seemed like a weird omission. AllocSlruRead() actually
left 'long_segment_names' uninitialized which is error-prone. We
could've just documented it, but it seems just as easy to support it.

- I split the multixact_internal.h header in a separate commit, to make
it more clear what changes are related to 64-bit offsets

I kept all the new test cases for now. We need to decide which ones are
worth keeping, and polish and speed up the ones we decide to keep.

I'm getting one failure from the pg_upgrade/008_mxoff test:

> [14:43:38.422](0.530s) not ok 26 - dump outputs from original and restored regression databases match
> [14:43:38.422](0.000s) # Failed test 'dump outputs from original and restored regression databases match'
> # at /home/heikki/git-sandbox/postgresql/src/test/perl/PostgreSQL/Test/Utils.pm line 801.
> [14:43:38.422](0.000s) # got: '1'
> # expected: '0'
> === diff of /home/heikki/git-sandbox/postgresql/build/testrun/pg_upgrade/008_mxoff/data/tmp_test_AC6A/oldnode_6_dump.sql_adjusted and /home/heikki/git-sandbox/postgresql/build/testrun/pg_upgrade/008_mxoff/data/tmp_test_AC6A/newnode_6_dump.sql_adjusted
> === stdout ===
> --- /home/heikki/git-sandbox/postgresql/build/testrun/pg_upgrade/008_mxoff/data/tmp_test_AC6A/oldnode_6_dump.sql_adjusted 2025-11-12 14:43:38.030399957 +0200
> +++ /home/heikki/git-sandbox/postgresql/build/testrun/pg_upgrade/008_mxoff/data/tmp_test_AC6A/newnode_6_dump.sql_adjusted 2025-11-12 14:43:38.314399819 +0200
> @@ -2,8 +2,8 @@
> -- PostgreSQL database dump
> --
> \restrict test
> --- Dumped from database version 17.6
> --- Dumped by pg_dump version 17.6
> +-- Dumped from database version 19devel
> +-- Dumped by pg_dump version 19devel
> SET statement_timeout = 0;
> SET lock_timeout = 0;
> SET idle_in_transaction_session_timeout = 0;=== stderr ===
> === EOF ===
> [14:43:38.425](0.004s) # >>> case #6

I ran the test with:

(rm -rf build/testrun/ build/tmp_install/;
olddump=/tmp/olddump-regress.sql oldinstall=/home/heikki/pgsql.17stable/
meson test -C build --suite setup --suite pg_upgrade)

- Heikki

Attachment	Content-Type	Size
consume-mxids.patch.txt	text/plain	3.4 KB
v24-0001-Move-pg_multixact-SLRU-page-format-definitions-t.patch	text/x-patch	10.3 KB
v24-0002-Use-64-bit-multixact-offsets.patch	text/x-patch	37.9 KB
v24-0003-Add-pg_upgrade-for-64-bit-multixact-offsets.patch	text/x-patch	33.3 KB
v24-0004-Remove-oldestOffset-oldestOffsetKnown-from-multi.patch	text/x-patch	6.1 KB
v24-0005-TEST-bump-catversion.patch	text/x-patch	796 bytes
v24-0006-TEST-Add-test-for-64-bit-mxoff-in-pg_resetwal.patch	text/x-patch	4.9 KB
v24-0007-TEST-Add-test-for-wraparound-of-next-new-multi-i.patch	text/x-patch	5.2 KB
v24-0008-TEST-Add-test-for-64-bit-mxoff-in-pg_upgrade.patch	text/x-patch	12.0 KB

In response to

Re: POC: make mxidoff 64 bits at 2025-11-07 16:03:11 from Maxim Orlov

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Григорий Новиков	2025-11-12 13:05:18	[PATCH] Add cascade synchronous replication
Previous Message	Ashutosh Bapat	2025-11-12 12:59:33	Re: Patch: dumping tables data in multiple chunks in pg_dump