| From: | Lukas Fittl <lukas(at)fittl(dot)com> |
|---|---|
| To: | Tomas Vondra <tomas(at)vondra(dot)me> |
| Cc: | Melanie Plageman <melanieplageman(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: EXPLAIN: showing ReadStream / prefetch stats |
| Date: | 2026-04-07 08:00:28 |
| Message-ID: | CAP53PkxCmAF9qioQpGM8+H4sb_0ThUrvKR3wLEQ+8QNdVN_tQQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Mon, Apr 6, 2026 at 6:14 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>
> On 4/7/26 02:19, Melanie Plageman wrote:
> > On Mon, Apr 6, 2026 at 4:39 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> >>
> >> On 4/6/26 18:50, Melanie Plageman wrote:
> >> If you think you can get this pushed, I'll do my best to finalize the
> >> instrumentation and SeqScan/TidRangeScan parts.
> >
> > I've pushed the first patch for index/index-only scans. Attached is
> > the BHS fix that uses the new pattern. It needs at least one review
> > before pushing. While I was polishing it, I realized I neglected to
> > use add_size()/mul_size() in the index-only/index scan patches. So,
> > 0002 is just a fix commit to do that. Feel free to push these if you
> > think they're ready. Otherwise, I'll do so pending your review in my
> > morning.
> >
>
> Thanks!
>
> I'll take a look in my morning, and will consider pushing the changes so
> that I can start pushing the parts adding the new instrumentation.
FWIW, 0001 and 0002 in the v12 patch set also make sense to me from a
code review just now.
>
> I've spent a fair amount of time looking at those parts today. Attached
> is v12 (including the two parts you posted as v11)
0003 looks good, but commit message needs improvement / justification
of the change (I assume you had already planned that).
0004:
> diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
> index 5b8b521802e..a854c41e963 100644
> --- a/doc/src/sgml/ref/explain.sgml
> +++ b/doc/src/sgml/ref/explain.sgml
> @@ -46,6 +46,7 @@ EXPLAIN [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] <rep
> TIMING [ <replaceable class="parameter">boolean</replaceable> ]
> SUMMARY [ <replaceable class="parameter">boolean</replaceable> ]
> MEMORY [ <replaceable class="parameter">boolean</replaceable> ]
> + IO [ <replaceable class="parameter">boolean</replaceable> ]
> FORMAT { TEXT | XML | JSON | YAML }
> </synopsis>
> </refsynopsisdiv>
> @@ -298,6 +299,17 @@ ROLLBACK;
> </listitem>
> </varlistentry>
> + <varlistentry>
> + <term><literal>IO</literal></term>
> + <listitem>
> + <para>
> + Include information on I/O performed by each node.
> + This parameter may only be used when <literal>ANALYZE</literal> is also
> + enabled. It defaults to <literal>FALSE</literal>.
> + </para>
> + </listitem>
> + </varlistentry>
+
Assuming my understanding of where this works is correct, this needs
to be qualified I think in terms of which nodes get I/O activity
shown. Otherwise I as the user will expect e.g. an Index Scan to show
I/O statistics. Additionally I think this should clearly say "read
I/O", since writes I/O information wouldn't show for any nodes.
> diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
> index f151f21f9b3..863a9dd0f0d 100644
> --- a/src/backend/commands/explain.c
> +++ b/src/backend/commands/explain.c
> @@ -3984,6 +3990,112 @@ show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es)
> }
> }
> ...
> + /* prefetch I/O info (only if there were actual I/Os) */
> + if (stats->io_count > 0)
> + {
> + ExplainIndentText(es);
> + appendStringInfo(es->str, "I/O: count=%" PRIu64 " waits=%" PRIu64
> + " size=%.2f inprogress=%.2f\n",
> + stats->io_count, stats->wait_count,
> + (stats->io_nblocks * 1.0 / stats->io_count),
> + (stats->io_in_progress * 1.0 / stats->io_count));
> + }
I've briefly pondered again whether "I/O:" is a bit too broad here in
EXPLAIN text output, i.e. whether it should qualified it somehow, e.g.
"I/O Stream", since its just the read stream I/O activity.
Right now it looks like this:
Seq Scan on xxx (cost=0.00..7.14 rows=114 width=483) (actual
time=0.140..0.222 rows=114.00 loops=1)
Prefetch: avg=1.67 max=3 capacity=94
I/O: count=3 waits=3 size=2.00 inprogress=1.00
Buffers: shared read=6
I/O Timings: shared read=0.158
What if we did this instead:
Seq Scan on xxx (cost=0.00..7.14 rows=114 width=483) (actual
time=0.140..0.222 rows=114.00 loops=1)
Prefetch: avg=1.67 max=3 capacity=94
I/O Stream: count=3 waits=3 size=2.00 inprogress=1.00
Buffers: shared read=6
I/O Timings: shared read=0.158
But I'm 50/50 whether that's a big enough problem to debate right now.
Especially with index prefetching out of the picture this is actually
less of a problem, since this should only show on scans where the I/O
activity is mainly (fully?) relying on read streams. If we had index
prefetching, my understanding is we would have had "I/O" show the read
stream activity (i.e. table access), but not the index access,
correct?
Btw, in that same test, when I re-run I get Prefetch but not I/O -- why is that?
Seq Scan on organizations (cost=0.00..7.42 rows=1 width=483) (actual
time=0.034..0.042 rows=1.00 loops=1)
Filter: (slug = 'pganalyze'::text)
Rows Removed by Filter: 113
Prefetch: avg=1.00 max=1 capacity=94
Buffers: shared hit=6
(if its all buffers hit, why are there any prefetches at all?)
> with two or three
> bigger changes compared to earlier versions:
>
> 1) default OFF
>
> After thinking about it a bit more, I decided to change the default to
> OFF. On the one hand I agree it's somewhat similar to BUFFERS, and that
> option is ON by default now. But on the other hand, we must not clutter
> EXPLAIN output with too much information, and I'm not convinced IO is
> worth it. So I changed to OFF by default. That also makes the patches
> smaller, due to not having to adjust that many tests.
Seems fair in terms of what to do now.
>
> 2) auto_explain
>
> It also occurred to me we should add a matching option to auto_explain,
> similarly to log_buffers. 0005 does that.
Agreed.
>
> 3) INSTRUMENT_IO
>
> The auto_explain bit also implies we need a new INSTRUMENT_IO constant,
> to handle this just like BUFFERS. Which also means we can actually
> collect the stats only when the IO option is specified (instead of for
> all ANALYZE runs). Which is nice.
FWIW, it'd be the only kind that's not populated by instrument.c (and
isn't a part of Instrumentation), but I don't see any immediate
practical issues with having the extra flag.
It does mean that INSTRUMENT_ALL will turn on INSTRUMENT_IO though for
the Instrumentation, which will always be a no-op. In practice the
only users of that are the InstrAlloc calls in
pg_stat_statements/auto_explain, which I have a patch for to refactor
anyway (over on the stack-based instrumentation thread, which to be
clear is PG20 material at this point).
We could probably just drop INSTRUMENT_ALL in the next cycle to avoid
confusion. For context, that use of INSTRUMENT_ALL won't affect
es_instrument, just how queryDesc->totaltime is initialized, and what
it captures at a summary level - and it won't know how to capture I/O
stats since they are not part of Instrumentation.
I would suggest noting the fact that its a no-op for Instrumentation
itself somewhere in code at least, e.g. where INSTRUMENT_IO is defined
on InstrumentOption.
0005 looks good.
I've skimmed 0006 and 0007 and it makes sense to me why they exist
(and we should add them), but I haven't reviewed the code for
correctness.
Thanks,
Lukas
--
Lukas Fittl
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Lukas Fittl | 2026-04-07 08:05:29 | Re: EXPLAIN: showing ReadStream / prefetch stats |
| Previous Message | Richard Guo | 2026-04-07 08:00:17 | Re: Potential partition pruning regression on PostgreSQL 18 |