| From: | Xuneng Zhou <xunengzhou(at)gmail(dot)com> |
|---|---|
| To: | Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com> |
| Cc: | pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Streamify more code paths |
| Date: | 2026-02-05 04:01:52 |
| Message-ID: | CABPTF7VUaRnvsXqa+628YkuR4oPVRr1mR2seXTkxabfiqQ3NHw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On Tue, Dec 30, 2025 at 10:43 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
>
> Hi,
>
> On Tue, Dec 30, 2025 at 9:51 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> >
> > Hi,
> >
> > Thanks for looking into this.
> >
> > On Mon, Dec 29, 2025 at 6:58 PM Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com> wrote:
> > >
> > > Hi,
> > >
> > > On Sun, 28 Dec 2025 at 14:46, Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> > > >
> > > > Hi,
> > > > >
> > > > > Two more to go:
> > > > > patch 5: Streamify log_newpage_range() WAL logging path
> > > > > patch 6: Streamify hash index VACUUM primary bucket page reads
> > > > >
> > > > > Benchmarks will be conducted soon.
> > > > >
> > > >
> > > > v6 in the last message has a problem and has not been updated. Attach
> > > > the right one again. Sorry for the noise.
> > >
> > > 0003 and 0006:
> > >
> > > You need to add 'StatApproxReadStreamPrivate' and
> > > 'HashBulkDeleteStreamPrivate' to the typedefs.list.
> >
> > Done.
> >
> > > 0005:
> > >
> > > @@ -1321,8 +1341,10 @@ log_newpage_range(Relation rel, ForkNumber forknum,
> > > nbufs = 0;
> > > while (nbufs < XLR_MAX_BLOCK_ID && blkno < endblk)
> > > {
> > > - Buffer buf = ReadBufferExtended(rel, forknum, blkno,
> > > - RBM_NORMAL, NULL);
> > > + Buffer buf = read_stream_next_buffer(stream, NULL);
> > > +
> > > + if (!BufferIsValid(buf))
> > > + break;
> > >
> > > We are loosening a check here, there should not be a invalid buffer in
> > > the stream until the endblk. I think you can remove this
> > > BufferIsValid() check, then we can learn if something goes wrong.
> >
> > My concern before for not adding assert at the end of streaming is the
> > potential early break in here:
> >
> > /* Nothing more to do if all remaining blocks were empty. */
> > if (nbufs == 0)
> > break;
> >
> > After looking more closely, it turns out to be a misunderstanding of the logic.
> >
> > > 0006:
> > >
> > > You can use read_stream_reset() instead of read_stream_end(), then you
> > > can use the same stream with different variables, I believe this is
> > > the preferred way.
> > >
> > > Rest LGTM!
> > >
> >
> > Yeah, reset seems a more proper way here.
> >
>
> Run pgindent using the updated typedefs.list.
>
I've completed benchmarking of the v4 streaming read patches across
three I/O methods (io_uring, sync, worker). Tests were run with cold
cache on large datasets.
--- Settings ---
shared_buffers = '8GB'
effective_io_concurrency = 200
io_method = $IO_METHOD
io_workers = $IO_WORKERS
io_max_concurrency = $IO_MAX_CONCURRENCY
track_io_timing = on
autovacuum = off
checkpoint_timeout = 1h
max_wal_size = 10GB
max_parallel_workers_per_gather = 0
--- Machine ---
CPU: 48-core
RAM: 256 GB DDR5
Disk: 2 x 1.92 TB NVMe SSD
--- Executive Summary ---
The patches provide significant benefits for I/O-bound sequential
operations, with the greatest improvements seen when using
asynchronous I/O methods (io_uring and worker). The synchronous I/O
mode shows reduced but still meaningful gains.
--- Results by I/O Method
Best Results: io_method=worker
bloom_scan: 4.14x (75.9% faster); 93% fewer reads
pgstattuple: 1.59x (37.1% faster); 94% fewer reads
hash_vacuum: 1.05x (4.4% faster); 80% fewer reads
gin_vacuum: 1.06x (5.6% faster); 15% fewer reads
bloom_vacuum: 1.04x (3.9% faster); 76% fewer reads
wal_logging: 0.98x (-2.5%, neutral/slightly slower); no change in reads
io_method=io_uring
bloom_scan: 3.12x (68.0% faster); 93% fewer reads
pgstattuple: 1.50x (33.2% faster); 94% fewer reads
hash_vacuum: 1.03x (3.3% faster); 80% fewer reads
gin_vacuum: 1.02x (2.1% faster); 15% fewer reads
bloom_vacuum: 1.03x (3.4% faster); 76% fewer reads
wal_logging: 1.00x (-0.5%, neutral); no change in reads
io_method=sync (baseline comparison)
bloom_scan: 1.20x (16.4% faster); 93% fewer reads
pgstattuple: 1.10x (9.0% faster); 94% fewer reads
hash_vacuum: 1.01x (0.8% faster); 80% fewer reads
gin_vacuum: 1.02x (1.7% faster); 15% fewer reads
bloom_vacuum: 1.03x (2.8% faster); 76% fewer reads
wal_logging: 0.99x (-0.7%, neutral); no change in reads
--- Observations ---
Async I/O amplifies streaming benefits: The same patches show 3-4x
improvement with worker/io_uring vs 1.2x with sync.
I/O operation reduction is consistent: All modes show the same ~93-94%
reduction in I/O operations for bloom_scan and pgstattuple.
VACUUM operations show modest gains: Despite large I/O reductions
(76-80%), wall-clock improvements are smaller (3-15%) since VACUUM has
larger CPU overhead (tuple processing, index maintenance, WAL
logging).
log_newpage_range shows no benefit: The patch provides no improvement (~0.97x).
--
Best,
Xuneng
| Attachment | Content-Type | Size |
|---|---|---|
| run_streaming_benchmark.sh | text/x-sh | 20.8 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Dilip Kumar | 2026-02-05 04:02:49 | Re: pg_upgrade: fix memory leak in SLRU I/O code |
| Previous Message | Dilip Kumar | 2026-02-05 03:57:45 | Re: Warn when creating or enabling a subscription with max_logical_replication_workers = 0 |