Re: explain_regress, explain(MACHINE), and default to explain(BUFFERS) (was: BUFFERS enabled by default in EXPLAIN (ANALYZE))

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: explain_regress, explain(MACHINE), and default to explain(BUFFERS) (was: BUFFERS enabled by default in EXPLAIN (ANALYZE))
Date: 2022-07-26 22:38:53
Message-ID: CAKFQuwbnT_B9i4NFWXX0V=xdcsTi0XTHvm6NVcr8XHi39O_A7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 24, 2022 at 9:54 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:

> I'm renaming this thread for better visibility, since buffers is a small,
> optional part of the patches I sent.
>
> I made a CF entry here.
> https://commitfest.postgresql.org/36/3409/
>
> On Wed, Dec 01, 2021 at 06:58:20PM -0600, Justin Pryzby wrote:
> > On Mon, Nov 15, 2021 at 01:09:54PM -0600, Justin Pryzby wrote:
> > > Some time ago, I had a few relevant patches:
> > > 1) add explain(REGRESS) which is shorthand for (BUFFERS OFF, TIMING
> OFF, COSTS OFF, SUMMARY OFF)
> > > 2) add explain(MACHINE) which elides machine-specific output from
> explain;
> > > for example, Heap Fetches, sort spaceUsed, hash nbuckets, and
> tidbitmap stuff.
> > >
> > >
> https://www.postgresql.org/message-id/flat/20200306213310(dot)GM684(at)telsasoft(dot)com
> >
> > The attached patch series now looks like this (some minor patches are not
> > included in this list):
> >
> > 1. add GUC explain_regress, which disables TIMING, SUMMARY, COSTS;
>

> > 2. enable explain(BUFFERS) by default (but disabled by explain_regress);
>

+1

> > 3. Add explain(MACHINE) - which is disabled by explain_regress.
> > This elides various machine-specific output like Memory and Disk use.
> > Maybe it should be called something else like "QUIET" or
> "VERBOSE_MINUS_1"
> > or ??
>

INCIDENTALS ?

Aside from being 4 syllables and, for me at least, a pain to remember how
to spell, it seems to fit.

RUNTIME is probably easier, and we just need to point out the difficulty in
naming things since actual rows is still shown, and timings have their own
independent option.

>
> > The regression tests now automatically run with explain_regress=on,
> which is
> > shorthand for TIMING OFF, SUMMARY OFF, COSTS OFF, and then BUFFERS OFF.
> >
> > There's a further option called explain(MACHINE) which can be disabled
> to hide
> > portions of the output that are unstable, like Memory/Disk/Batches/
> > Heap Fetches/Heap Blocks. This allows "explain analyze" to be used more
> easily
> > in regression tests, and simplifies some existing tests that currently
> use
> > plpgsql functions to filter the output. But it doesn't handle all the
> > variations from parallel workers.
> >
> > (3) is optional, but simplifies some regression tests. The patch series
> could
> > be rephrased with (3) first.
>

I like the idea of encapsulating the logic about what to show into the
generation code instead of a post-processing routine.

I don't see any particular ordering of the three changes being most clear.

>
> > Unfortunately, "COSTS OFF" breaks postgres_fdw remote_estimate. If
> specifying
> > "COSTS ON" in postgres_fdw.c is considered to be a poor fix

Given that we have version tests scattered throughout the postgres_fdw.c
code I'd say protect it with version >= 9.0 and call it good.

But I'm assuming that we are somehow also sending over the GUC to the
remote server (which will be v16+) but I am unsure why that would be the
case. I'm done code reading for now so I'm leaving this an open
request/question.

, then I suppose
> > this patch series could do as suggested and enable buffers by default
> only when
> > ANALYZE is specified. Then postgres_fdw is not affected, and the
> > explain_regress GUC is optional: instead, we'd need to specify BUFFERS
> OFF in
> > ~100 regression tests which use EXPLAIN ANALYZE.

I'm not following the transition from the prior sentences about COSTS to
this one regarding BUFFERS.

(3) still seems useful on its
> > own.
>

It is an orthogonal feature with a different section of our user base being
catered to, so I'd agree by default.

Patch Review Comments:

The following change in the machine commit seems out-of-place; are we
fixing a bug here?

explain.c:
/* Show buffer usage in planning */
- if (bufusage)
+ if (bufusage && es->buffers)

Typo (trailing comment // without comment):

ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es); //

I did a pretty thorough look-through of the locations of the various
changes and everything seems consistent with the established code in this
area (took me a bit to get to the level of comprehension though). Whether
there may be something missing I'm less able to judge.

From reading the other main thread I'm not finding this particular choice
of GUC usage to be a problem anyone has raised and it does seem the
cleanest solution, both for us and for anyone out there with a similar
testing framework.

The machine output option covers an obvious need since we've already
written ad-hoc parsing code to deal with the problem.

And, as someone who sees first-hand the kinds of conversations that occur
surrounding beginner's questions, and getting more into performance
questions specifically, I'm sympathetic with the desire to have BUFFERS
something that is output by default. Given existing buy-in for that idea
I'd hope that at minimum the "update 100 .sql and expected output files,
then change the default" commit happens even if the rest take some further
discussion. Though with even just a bit of attention I don't see any real
problems getting the whole thing resolved one way or another.

David J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2022-07-26 22:45:44 Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Previous Message David Rowley 2022-07-26 22:37:26 Re: Add proper planner support for ORDER BY / DISTINCT aggregates