Quick Links

Re: Streamify more code paths

From:	Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Subject:	Re: Streamify more code paths
Date:	2026-03-12 04:39:32
Message-ID:	CABPTF7Wz8OCcr2f1CSdt-jRHfJAAdXZwwBokBD4baZAPyK7CgA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Mar 12, 2026 at 11:42 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Thu, Mar 12, 2026 at 06:33:08AM +0900, Michael Paquier wrote:
> > Thanks for doing that. On my side, I am going to look at the gin and
> > hash vacuum paths first with more testing as these don't use a custom
> > callback. I don't think that I am going to need a lot of convincing,
> > but I'd rather produce some numbers myself because doing something.
> > I'll tweak a mounting point with the delay trick, as well.
>
> While debug_io_direct has been helping a bit, the trick for the delay
> to throttle the IO activity has helped much more with my runtime
> numbers. I have mounted a separate partition with a delay of 5ms,
> disabled checkums (this part did not make a real difference), and
> evicted shared buffers for relation and indexes before the VACUUM.
>
> Then I got better numbers. Here is an extract:
> - worker=3:
> gin_vacuum (100k tuples) base= 1448.2ms patch= 572.5ms 2.53x
> ( 60.5%) (reads=175→104, io_time=1382.70→506.64ms)
> gin_vacuum (300k tuples) base= 3728.0ms patch= 1332.0ms 2.80x
> ( 64.3%) (reads=486→293, io_time=3669.89→1266.27ms)
> bloom_vacuum (100k tuples) base= 21826.8ms patch= 17220.3ms 1.27x
> ( 21.1%) (reads=485→117, io_time=4773.33→270.56ms)
> bloom_vacuum (300k tuples) base= 67054.0ms patch= 53164.7ms 1.26x
> ( 20.7%) (reads=1431.5→327.5, io_time=13880.2→381.395ms)
> - io_uring:
> gin_vacuum (100k tuples) base= 1240.3ms patch= 360.5ms 3.44x
> ( 70.9%) (reads=175→104, io_time=1175.35→299.75ms)
> gin_vacuum (300k tuples) base= 2829.9ms patch= 642.0ms 4.41x
> ( 77.3%) (reads=465.5→293, io_time=2768.46→579.04ms)
> bloom_vacuum (100k tuples) base= 22121.7ms patch= 17532.3ms 1.26x
> ( 20.7%) (reads=485→117, io_time=4850.46→285.28ms)
> bloom_vacuum (300k tuples) base= 67058.0ms patch= 53118.0ms 1.26x
> ( 20.8%) (reads=1431.5→327.5, io_time=13870.9→305.44ms)
>
> The higher the number of tuples, the better the performance for each
> individual operation, but the tests take a much longer time (tens of
> seconds vs tens of minutes). For GIN, the numbers can be quite good
> once these reads are pushed. For bloom, the runtime is improved, and
> the IO numbers are much better.
>
> At the end, I have applied these two parts. Remains now the hash
> vacuum and the two parts for pgstattuple.
> --
> Michael

Thanks for running the benchmarks and pushing!

Here're the results of my test with debug_io_direct and delay :

-- io_uring, medium size

bloom_vacuum_medium base= 8355.2ms patch= 715.0ms 11.68x
( 91.4%) (reads=4732→1056, io_time=7699.47→86.52ms)
pgstattuple_medium base= 4012.8ms patch= 213.7ms 18.78x
( 94.7%) (reads=2006→2006, io_time=4001.66→200.24ms)
pgstatindex_medium base= 5490.6ms patch= 37.9ms 144.88x
( 99.3%) (reads=2745→173, io_time=5481.54→7.82ms)
hash_vacuum_medium base= 34483.4ms patch= 2703.5ms 12.75x
( 92.2%) (reads=19166→3901, io_time=31948.33→308.05ms)
wal_logging_medium base= 7778.6ms patch= 7814.5ms 1.00x
( -0.5%) (reads=2857→2845, io_time=11.84→11.45ms)

-- worker, medium size
bloom_vacuum_medium base= 8376.2ms patch= 747.7ms 11.20x
( 91.1%) (reads=4732→1056, io_time=7688.91→65.49ms)
pgstattuple_medium base= 4012.7ms patch= 339.0ms 11.84x
( 91.6%) (reads=2006→2006, io_time=4002.23→49.99ms)
pgstatindex_medium base= 5490.3ms patch= 38.3ms 143.23x
( 99.3%) (reads=2745→173, io_time=5480.60→16.24ms)
hash_vacuum_medium base= 34638.4ms patch= 2940.2ms 11.78x
( 91.5%) (reads=19166→3901, io_time=31881.61→242.01ms)
wal_logging_medium base= 7440.1ms patch= 7434.0ms 1.00x
( 0.1%) (reads=2861→2825, io_time=10.62→10.71ms)

-- Setting read delay only
sudo dmsetup reload "$DM_DELAY_DEV" --table "0 $size delay $dev 0 $ms $dev 0 0"
Setting dm_delay on delayed to 2ms read / 0ms write

After setting the write delay to 0ms, I can observe more pronounced
speedups overall, since vacuum operation is write-intensive — delaying
writes might dominate the runtime and mask the read-path improvement
we're measuring. It also speeds up the runtime of the test.

-- wal_logging
The wal_logging patch does not seem to benefit from streamification in
this configuration either.

-- Delay settup
For anyone wanting to reproduce the results with a simulated-latency
device, here is the setup I used.

1. Create a 50GB file-backed block device (enough for PG data + indexes)

sudo dd if=/dev/zero of=/srv/delay_disk.img bs=1M count=50000 status=progress
sudo losetup /dev/loop0 /srv/delay_disk.img

2. Create the dm_delay device with 2ms delay
sudo dmsetup create delayed --table "0 $(sudo blockdev --getsz
/dev/loop0) delay /dev/loop0 0 2"

3. Format and mount it

sudo mkfs.ext4 /dev/mapper/delayed
sudo mkdir -p /srv/pg_delayed
sudo mount /dev/mapper/delayed /srv/pg_delayed
sudo chown $(whoami) /srv/pg_delayed

4. Run benchmark with WORKROOT pointing to the delayed device

WORKROOT=/srv/pg_delayed SIZES=medium REPS=3 \
./run_streaming_benchmark.sh --baseline --io-method io_uring \
--test gin_vacuum --direct-io --io-delay 2 \
the targeted patch

--
Best,
Xuneng

Attachment	Content-Type	Size
v6-0003-Streamify-hash-index-VACUUM-primary-bucket-page-r.patch	application/x-patch	5.5 KB
v6-0002-Streamify-heap-bloat-estimation-scan.-Introduce-a.patch	application/x-patch	6.4 KB
v6-0005-Use-streaming-read-API-in-pgstatindex-functions.patch	application/x-patch	4.5 KB
v6-0004-Streamify-log_newpage_range-WAL-logging-path.patch	application/x-patch	2.4 KB
run_streaming_benchmark.sh	application/x-sh	32.6 KB

In response to

Re: Streamify more code paths at 2026-03-12 03:42:28 from Michael Paquier

Responses

Re: Streamify more code paths at 2026-03-12 15:35:48 from Xuneng Zhou

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Fujii Masao	2026-03-12 04:50:28	Re: Add missing stats_reset column to pg_stat_database_conflicts view
Previous Message	Tom Lane	2026-03-12 04:30:11	Re: POC: PLpgSQL FOREACH IN JSON ARRAY