Re: Use streaming read API in ANALYZE

From: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Use streaming read API in ANALYZE
Date: 2024-04-02 07:23:55
Message-ID: CAN55FZ2_4hgTRK=TujRj=J2VYk8Rm8=H1KCTCeKm8zYN1zd3Nw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thanks for the review!

On Wed, 27 Mar 2024 at 23:15, Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
>
> On Tue, Mar 26, 2024 at 02:51:27PM +0300, Nazir Bilal Yavuz wrote:
> > Hi,
> >
> > On Wed, 28 Feb 2024 at 14:42, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com> wrote:
> > >
> > >
> > > The new version of the streaming read API [1] is posted. I updated the
> > > streaming read API changes patch (0001), using the streaming read API
> > > in ANALYZE patch (0002) remains the same. This should make it easier
> > > to review as it can be applied on top of master
> > >
> > >
> >
> > The new version of the streaming read API is posted [1]. I rebased the
> > patch on top of master and v9 of the streaming read API.
> >
> > There is a minimal change in the 'using the streaming read API in ANALYZE
> > patch (0002)', I changed STREAMING_READ_FULL to STREAMING_READ_MAINTENANCE
> > to copy exactly the same behavior as before. Also, some benchmarking
> > results:
> >
> > I created a 22 GB table and set the size of shared buffers to 30GB, the
> > rest is default.
> >
> > ╔═══════════════════════════╦═════════════════════╦════════════╗
> > ║ ║ Avg Timings in ms ║ ║
> > ╠═══════════════════════════╬══════════╦══════════╬════════════╣
> > ║ ║ master ║ patched ║ percentage ║
> > ╠═══════════════════════════╬══════════╬══════════╬════════════╣
> > ║ Both OS cache and ║ ║ ║ ║
> > ║ shared buffers are clear ║ 513.9247 ║ 463.1019 ║ %9.9 ║
> > ╠═══════════════════════════╬══════════╬══════════╬════════════╣
> > ║ OS cache is loaded but ║ ║ ║ ║
> > ║ shared buffers are clear ║ 423.1097 ║ 354.3277 ║ %16.3 ║
> > ╠═══════════════════════════╬══════════╬══════════╬════════════╣
> > ║ Shared buffers are loaded ║ ║ ║ ║
> > ║ ║ 89.2846 ║ 84.6952 ║ %5.1 ║
> > ╚═══════════════════════════╩══════════╩══════════╩════════════╝
> >
> > Any kind of feedback would be appreciated.
>
> Thanks for working on this!
>
> A general review comment: I noticed you have the old streaming read
> (pgsr) naming still in a few places (including comments) -- so I would
> just make sure and update everywhere when you rebase in Thomas' latest
> version of the read stream API.

Done.

>
> > From c7500cc1b9068ff0b704181442999cd8bed58658 Mon Sep 17 00:00:00 2001
> > From: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
> > Date: Mon, 19 Feb 2024 14:30:47 +0300
> > Subject: [PATCH v3 2/2] Use streaming read API in ANALYZE
> >
> > --- a/src/backend/commands/analyze.c
> > +++ b/src/backend/commands/analyze.c
> > @@ -1102,6 +1102,26 @@ examine_attribute(Relation onerel, int attnum, Node *index_expr)
> > return stats;
> > }
> >
> > +/*
> > + * Prefetch callback function to get next block number while using
> > + * BlockSampling algorithm
> > + */
> > +static BlockNumber
> > +pg_block_sampling_streaming_read_next(StreamingRead *stream,
> > + void *user_data,
> > + void *per_buffer_data)
>
> I don't think you need the pg_ prefix

Done.

>
> > +{
> > + BlockSamplerData *bs = user_data;
> > + BlockNumber *current_block = per_buffer_data;
>
> Why can't you just do BufferGetBlockNumber() on the buffer returned from
> the read stream API instead of allocating per_buffer_data for the block
> number?

Done.

>
> > +
> > + if (BlockSampler_HasMore(bs))
> > + *current_block = BlockSampler_Next(bs);
> > + else
> > + *current_block = InvalidBlockNumber;
> > +
> > + return *current_block;
>
>
> I think we'd like to keep the read stream code in heapam-specific code.
> Instead of doing streaming_read_buffer_begin() here, you could put this
> in heap_beginscan() or initscan() guarded by
> scan->rs_base.rs_flags & SO_TYPE_ANALYZE

In the recent changes [1], heapam_scan_analyze_next_[block | tuple]
are removed from tableam. They are directly called from
heapam-specific code now. So, IMO, no need to do this now.

v4 is rebased on top of v14 streaming read API changes.

[1] 27bc1772fc814946918a5ac8ccb9b5c5ad0380aa

--
Regards,
Nazir Bilal Yavuz
Microsoft

Attachment Content-Type Size
v4-0001-v14-Streaming-Read-API.patch text/x-patch 74.4 KB
v4-0002-Use-streaming-read-API-in-ANALYZE.patch text/x-patch 7.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2024-04-02 07:24:04 Re: Reports on obsolete Postgres versions
Previous Message Zhijie Hou (Fujitsu) 2024-04-02 07:20:46 RE: Synchronizing slots from primary to standby