Re: Streamify more code paths

From: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Subject: Re: Streamify more code paths
Date: 2026-03-15 02:51:05
Message-ID: CABPTF7U4CvFp55wVg51b8b24J1GQfjB62SzvfXY0ZgjaRWPYfA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 14, 2026 at 5:56 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Fri, Mar 13, 2026 at 10:39:52AM +0800, Xuneng Zhou wrote:
> > Thanks for fixing this and for taking the time to review and test
> > the patches.
>
> Looking at the rest, I have produced some numbers:
> pgstattuple_small (20k tuples, io_uring) base= 60839.9ms
> patch=10949.9ms 5.56x ( 82.0%) (reads=4139->260,
> io_time=49616.97->55.25ms)
> pgstattuple_small (20k tuples, worker=3) base= 60577.5ms
> patch=11470.0ms 5.28x ( 81.1%) (reads=4139->260,
> io_time=49359.79->69.60ms)
> hash_vacuum (1M tuples, io_uring) base=199929.0ms patch=161747.0ms
> 1.24x ( 19.1%) (reads=4665->1615, io_time=47084.8->9925.77ms)
> hash_vacuum (1M tuples, worker=12) base=203417.0ms patch=161687.0ms
> 1.26x ( 20.5%) (reads=4665->1615, io_time=48356.3->9917.24ms)
>
> The hash vacuum numbers are less amazing here than yours. Trying out
> various configurations does not change the results much (I was puzzled
> for a couple of hours that I did not see any performance impact but
> forgot the eviction of the index pages from the shared buffers, that
> influences the numbers to what I have here), but I'll take it anyway.

My guess is that the results are influenced by the write delay. Vacuum
operations can be write-intensive, so when both read and write delays
are set to 2 ~ 5 ms, a large portion of the runtime may be spent on
writes. According to Amdahl’s Law, the overall performance improvement
from optimizing a single component is limited by the fraction of time
that component actually contributes to the total execution time. In
this case, the potential rate of speedup from streaming the read path
could be masked by the time spent performing writes.

To investigate this, I added a new option, write-delay. When it is set
to zero, the benchmark simulates a system with a fast write device and
a slow read device, reducing the proportion of time spent on writes.
Admittedly, this setup is somewhat artificial—we would not normally
expect such a large discrepancy between read and write performance in
real systems.

-- worker 12, write-delay 2 ms
hash_vacuum_medium base= 33743.2ms patch= 27371.3ms 1.23x
( 18.9%) (reads=4662→1612, read_time=8242.51→1725.03ms,
writes=12689→12651, write_time=25144.87→25041.75ms)

-- worker 12, write-delay 0 ms
hash_vacuum_medium base= 8601.1ms patch= 2234.0ms 3.85x
( 74.0%) (reads=4662→1612, read_time=8021.65→1637.87ms,
writes=12689→12651, write_time=337.38→288.15ms)

To better understand the behavior, the latest version of the script
separates the I/O time into read time and write time. This allows us
to directly observe their respective contributions and how they change
across runs. A further improvement would be to report the speedup for
the read and write components separately, making it easier to
understand where and how much the performance gains actually occur.

> One thing that I was wondering for the pgstattuple patch is if we
> should have "scanned" put outside the private data of the callback as
> we get back to the main loop once we know that the page is not
> all-visible, so we could increment the counter in the main loop
> instead of the callback. Now I get that you have done that as it
> feels cleaner for the "default" return path of the callback, while the
> logic remains the same, so I have kept it as-is at the end, tweaked a
> few things, and applied this one.

Thanks for the review and for applying it. My reasoning for putting
scanned inside the callback was to keep all per-block accounting in
one place — the callback is already the point where the skip-vs-read
decision is made, so it seemed natural to count reads there as well.
But I agree the main loop would also be a clean spot for it.

> I have not been able to review yet the patch for the hash VACUUM
> proposal, which would be the last one.
> --
> Michael

--
Best,
Xuneng

Attachment Content-Type Size
run_streaming_benchmark.sh text/x-sh 34.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Xuneng Zhou 2026-03-15 03:47:06 Re: Streamify more code paths
Previous Message Alexandre Felipe 2026-03-15 02:37:07 Re: Addressing buffer private reference count scalability issue