Re: Use streaming read API in ANALYZE

From: Andres Freund <andres(at)anarazel(dot)de>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, jakub(dot)wartak(at)enterprisedb(dot)com, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Use streaming read API in ANALYZE
Date: 2024-04-07 22:00:00
Message-ID: 20240407220000.7dt4ynyyfi6fuzqy@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2024-04-07 16:59:26 -0400, Melanie Plageman wrote:
> From 1dc2343661f3edb3b1bc4307afb0e956397eb76c Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> Date: Sun, 7 Apr 2024 14:55:22 -0400
> Subject: [PATCH v10 1/3] Make heapam_scan_analyze_next_[tuple|block] static.
>
> 27bc1772fc81 removed the table AM callbacks scan_analyze_next_block and
> scan_analzye_next_tuple -- leaving their heap AM implementations only
> called by acquire_sample_rows().

Ugh, I don't think 27bc1772fc81 makes much sense. But that's unrelated to this
thread. I did raise that separately
https://www.postgresql.org/message-id/20240407214001.jgpg5q3yv33ve6y3%40awork3.anarazel.de

Unless I seriously missed something, I see no alternative to reverting that
commit.

> @@ -1206,11 +1357,13 @@ acquire_sample_rows(Relation onerel, int elevel,
> break;
>
> prefetch_block = BlockSampler_Next(&prefetch_bs);
> - PrefetchBuffer(scan->rs_rd, MAIN_FORKNUM, prefetch_block);
> + PrefetchBuffer(scan->rs_base.rs_rd, MAIN_FORKNUM, prefetch_block);
> }
> }
> #endif
>
> + scan->rs_cbuf = InvalidBuffer;
> +
> /* Outer loop over blocks to sample */
> while (BlockSampler_HasMore(&bs))
> {

I don't think it's good to move a lot of code *and* change how it is
structured in the same commit. Makes it much harder to actually see changes /
makes git blame harder to use / etc.

> From 90d115c2401567be65bcf64393a6d3b39286779e Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> Date: Sun, 7 Apr 2024 15:28:32 -0400
> Subject: [PATCH v10 2/3] Use streaming read API in ANALYZE
>
> The ANALYZE command prefetches and reads sample blocks chosen by a
> BlockSampler algorithm. Instead of calling Prefetch|ReadBuffer() for
> each block, ANALYZE now uses the streaming API introduced in b5a9b18cd0.
>
> Author: Nazir Bilal Yavuz
> Reviewed-by: Melanie Plageman
> Discussion: https://postgr.es/m/flat/CAN55FZ0UhXqk9v3y-zW_fp4-WCp43V8y0A72xPmLkOM%2B6M%2BmJg%40mail.gmail.com
> ---
> src/backend/commands/analyze.c | 89 ++++++++++------------------------
> 1 file changed, 26 insertions(+), 63 deletions(-)

That's a very nice demonstration of how this makes good prefetching easier...

> From 862b7ac81cdafcda7b525e02721da14e46265509 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> Date: Sun, 7 Apr 2024 15:38:41 -0400
> Subject: [PATCH v10 3/3] Obsolete BlockSampler_HasMore()
>
> A previous commit stopped using BlockSampler_HasMore() for flow control
> in acquire_sample_rows(). There seems little use now for
> BlockSampler_HasMore(). It should be sufficient to return
> InvalidBlockNumber from BlockSampler_Next() when BlockSample_HasMore()
> would have returned false. Remove BlockSampler_HasMore().
>
> Author: Melanie Plageman, Nazir Bilal Yavuz
> Discussion: https://postgr.es/m/flat/CAN55FZ0UhXqk9v3y-zW_fp4-WCp43V8y0A72xPmLkOM%2B6M%2BmJg%40mail.gmail.com

The justification here seems somewhat odd. Sure, the previous commit stopped
using BlockSampler_HasMore in acquire_sample_rows - but only because it was
moved to block_sampling_streaming_read_next()?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2024-04-07 22:11:32 Re: ❓ JSON Path Dot Precedence
Previous Message Melih Mutlu 2024-04-07 21:56:56 Re: Flushing large data immediately in pqcomm