Re: [PATCH] Improve ability to display optimizer analysis using OPTIMIZER_DEBUG

From: Vladimir Churyukin <vladimir(at)churyukin(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com>, pghackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Improve ability to display optimizer analysis using OPTIMIZER_DEBUG
Date: 2023-01-04 08:52:41
Message-ID: CAFSGpE3pdH4pgUWvJm0X0CozXgNmCALZZEP0GdT4HvBdo_Vtzw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 3, 2023 at 9:55 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> David Rowley <dgrowleyml(at)gmail(dot)com> writes:
> > The thing I had in mind was some mode that would record additional
> > details during planning that could be tagged onto the final plan in
> > createplan.c so that EXPLAIN could display them. I just think that
> > EXPLAIN is the place where people go to learn about _what_ the plan is
> > and it might also be the place where they might expect to go to find
> > out more details about _why_ that plan was chosen. I by no means have
> > a fully bakes idea on what that would look like, but I just think that
> > dumping a bunch of lines to stdout is not going to be useful to many
> > people and we need to think of something better in order to properly
> > make this useful.
>
> There's a number of problems in this area, but I think the really
> fundamental issue is that for speed reasons the planner wants to
> reject losing plan alternatives as quickly as possible. So we simply
> don't pursue those alternatives far enough to produce anything that
> could serve as input for EXPLAIN (in its current form, anyway).
>

That's not necessarily a fundamental issue for EXPLAIN (well, in theory,
not sure if there are fundamental limitations of the current
implementation).
When somebody runs EXPLAIN, they don't necessarily care that much about its
performance, as long as it returns results in reasonable time.
So if the planner does some extra work in that mode to better display why
the specific path was chosen, it should probably be ok from the performance
perspective.

> What that means is that a trace of add_path decisions just can't be
> very useful to an end user: there isn't enough data to present the
> decisions in a recognizable form, besides which there is too much
> noise because most of the rejected options are in fact silly.
> So indeed we find that even hard-core developers aren't interested
> in consuming the data in that form.
>

Even if the output is not very human-readable, it still can be useful, if
there are tools that consume the output and extract
meaningful data while omitting meaningless noise (if the meaningful data
exists there of course).

> Another issue is that frequently the problem is that we never
> considered the desired plan at all, so that even if you had a
> perfectly readable add_path trace it wouldn't show you what you want
> to see. This might happen because the planner is simply incapable of
> producing that plan shape from the given query, but often it happens
> for reasons like "function F() used in the query is marked volatile,
> so we didn't flatten a subquery or consider an indexscan or whatever".
> I'm not sure how we could produce output that would help people
> discover that kind of problem ... but I am sure that an add_path
> trace won't do it.
>
> So, not only am I pretty down on exposing OPTIMIZER_DEBUG in
> its current form, but I don't really believe that adding hooks
> to add_path would allow an extension to produce anything of value.
> I'd for sure want to see a convincing demonstration to the contrary
> before we slow down that hot code path by adding hook calls.

Pardon my ignorance, but I'm curious, how changes in planner code are
currently validated?
Let's say, you add some extra logic that introduces different paths in some
cases, or adjust some constants. How do you validate this logic doesn't
slow down something else dramatically?
I see some EXPLAIN output checks in regression tests (not that many
though), so I'm curious how regressions in planning are currently tested.
Not the simple ones, when you have a small input and predictable
plan/output, but something that can happen with more or less real data
distribution on medium / large datasets.

-Vladimir Churyukin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2023-01-04 09:28:20 Re: Underscores in numeric literals
Previous Message Ankit Kumar Pandey 2023-01-04 08:21:56 Re: [PATCH] Improve ability to display optimizer analysis using OPTIMIZER_DEBUG