Re: EXPLAIN: showing ReadStream / prefetch stats

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Lukas Fittl <lukas(at)fittl(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: EXPLAIN: showing ReadStream / prefetch stats
Date: 2026-04-07 17:34:16
Message-ID: 552ecf6c-e8a2-4183-afc4-eff592674475@vondra.me
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/7/26 18:07, Melanie Plageman wrote:
> On Tue, Apr 7, 2026 at 8:36 AM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>>
>> I've pushed the first two parts earlier today. Here's a v13 with the
>> remaining pieces rebased on top of that, with updated commit messages
>> and some additional fixes.
>
> I noticed a couple of things while reviewing 0004.
>
> I see I had bitmapheapscan use
> if (!node->ss.ps.instrument || pcxt->nworkers == 0)
> return;
>
> while seq scan uses
> if (!estate->es_instrument || pcxt->nworkers == 0)
> return;
>
> That does seem worth being consistent about. Though the estate one is
> probably better to use and changing bitmapheapscan in this commit
> might be noisy... I don't feel strongly either way.
>

I'm not sure this is just a question of consistency, because BHS may
need the shared instrumentation even if (es_instrument = 0), no? While
the seqscan/tidrange scan only need it with EXPLAIN(IO).

So I think the two nodes should check

((estate->es_instrument & INSTRUMENT_IO) == 0)

The attached v14 does it this way. I left BHS to check ps.instrument, I
think that's fine.

> In ExecSeqScanInstrumentInitWorker() -> shm_toc_lookup() noerror is true:
>
> node->sinstrument = shm_toc_lookup(pwcxt->toc,
> node->ss.ps.plan->plan_node_id
> + PARALLEL_KEY_SCAN_INSTRUMENT_OFFSET,
> true);
>
> I went back and forth on this for bitmapheapscan and ended up going
> with the guard
> if (!node->ss.ps.instrument)
> return;
> and passing noerror as false. I thought it would be better to get an
> error if something went wrong.
>
> If you keep noerror true, there's no reason to check for es_instrument.

I agree this should use noerror=false. It's an error to not find the
expected DSM chunk, and we don't want to hide that. It'd probably fail
with a segfault later, but that's hardly a better outcome.

>
> I also noticed that seq scan doesn't use add_size/mul_size for the
> size calculations when estimating and allocating DSM. I switched to
> using them in bitmapheapscan because I realized other code in that
> file used them. Seems seq scan was doing plain arithmetic without
> those helpers already. And overflow is unlikely here. But I wonder if
> it is worth using them?
>

Seems better to use that, so done in v14.

> It doesn't matter for the Retrieve functions because overflow would
> have caused us to error out at estimation or allocation time.
>
> Most of the seq scan specific stuff above applies to tid range scan as well.
>

OK, fixed that too.

> One other point about 0002: I wonder if the text explain output should
> say "in-progress" instead of "inprogress"
>

True. Changed.

Thanks!

--
Tomas Vondra

Attachment Content-Type Size
v14-0001-Switch-EXPLAIN-to-unaligned-output-for-json-xml-.patch text/x-patch 13.4 KB
v14-0002-Add-EXPLAIN-IO-infrastructure-with-BitmapHeapSca.patch text/x-patch 22.2 KB
v14-0003-auto_explain-Add-new-GUC-auto_explain.log_io.patch text/x-patch 3.6 KB
v14-0004-Add-EXPLAIN-IO-instrumentation-for-SeqScan.patch text/x-patch 14.8 KB
v14-0005-Add-EXPLAIN-IO-instrumentation-for-TidRangeScan.patch text/x-patch 12.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2026-04-07 17:59:48 Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Previous Message Daniil Davydov 2026-04-07 17:27:32 Re: test_autovacuum/001_parallel_autovacuum is broken