Re: Streamify more code paths

From: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Subject: Re: Streamify more code paths
Date: 2026-03-11 01:37:57
Message-ID: CABPTF7XFEOHpbju_pjCFHDffP_rWJU-405c6aoQdx4JjCOBimA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Andres,

On Wed, Mar 11, 2026 at 7:04 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2026-03-10 19:28:29 +0900, Michael Paquier wrote:
> > On Tue, Mar 10, 2026 at 02:06:12PM +0800, Xuneng Zhou wrote:
> > > Here’s v5 of the patchset. The wal_logging_large patch has been
> > > removed, as no performance gains were observed in the benchmark runs.
> >
> > Looking at the numbers you are posting, it is harder to get excited
> > about the hash, gin, bloom_vacuum and wal_logging.
>
> It's perhaps worth emphasizing that, to allow real world usage of direct IO,
> we'll need streaming implementation for most of these. Also, on windows the OS
> provided readahead is ... not aggressive, so you'll hit IO stalls much more
> frequently than you'd on linux (and some of the BSDs).
>
> It might be a good idea to run the benchmarks with debug_io_direct=data.
> That'll make them very slow, since the write side doesn't yet use AIO and thus
> will do a lot of synchronous writes, but it should still allow to evaluate the
> gains from using read stream.
>
>
> The other thing that's kinda important to evaluate read streams is to test on
> higher latency storage, even without direct IO. Many workloads are not at all
> benefiting from AIO when run on a local NVMe SSD with < 10us latency, but are
> severely IO bound when run on a cloud storage disk with 0.5ms - 4ms latency.
>
>
> To be able to test such higher latencies locally, I've found it quite useful
> to use dm_delay above a fast disk. See [1].

Thanks for the tips! I currently don’t have access to a machine or
cloud instance with slower SSDs or HDDs that have higher latency. I’ll
try running the benchmark with debug_io_direct=data and dm_delay, as
you suggested, to see if the results vary.

>
> > The worker method seems more efficient, may show that we are out of noise
> > level.
>
> I think that's more likely to show that memory bandwidth, probably due to
> checksum computations, is a factor. The memory copy (from the kernel page
> cache, with buffered IO) and the checksum computations (when checksums are
> enabled) are parallelized by worker, but not by io_uring.
>
>
> Greetings,
>
> Andres Freund
>
>
> [1]
>
> https://docs.kernel.org/admin-guide/device-mapper/delay.html
>
> Assuming /dev/md0 is mounted to /srv, and a delay of 1ms should be
> introduced for it:
>
> umount /srv && dmsetup create delayed --table "0 $(blockdev --getsz /dev/md0) delay /dev/md0 0 1" /dev/md0 && mount /dev/mapper/delayed /srv/
>
> To update the amount of delay to 3ms the following can be used:
> dmsetup suspend delayed && dmsetup reload delayed --table "0 $(blockdev --getsz /dev/md0) delay /dev/md0 0 3" /dev/md0 && dmsetup resume delayed
>
> (I will often just update the delay to 0 for comparison runs, as that
> doesn't require remounting)

--
Best,
Xuneng

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2026-03-11 01:53:36 Re: finish TODOs in to_json_is_immutable, to_jsonb_is_immutable also add tests on it
Previous Message Shinoda, Noriyoshi (PSD Japan FSI) 2026-03-11 01:13:29 RE: Adding REPACK [concurrently]