plan shape work

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: plan shape work
Date: 2025-05-19 18:01:55
Message-ID: CA+TgmoZxQO8svE_vtNCkEubnCYrnrCEnhftdbkdZ496Nfhg=wQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

A couple of people at pgconf.dev seemed to want to know more about my
ongoing plan shape work, so here are the patches I have currently.
This is a long way from something that actually looks like a usable
feature, but these are bits of infrastructure that I think will be
necessary to get to a usable feature. As a recap, my overall goal here
is to make it so that you can examine a finished plan, figure out what
decisions the planner made, and then somehow get the planner to make
those same decisions over again in a future planning cycle. Since
doing this for all types of planner decisions seems too difficult for
an initial goal, I'm focusing on scans and joins for now. A further
goal is that I want it to be possible for extensions to use this
infrastructure to implement a variety of different policies that they
might feel to be beneficial, so I'm looking to minimize the amount of
stuff that has to be done in core PostgreSQL or can only be used by
core PostgreSQL.

So far I've identified two main problems. First, you need to be able
to reconstruct the planner decisions from the final plan, which you
almost can do already but we're missing a few key pieces of
information in the final plan tree. Second, you need to be able to
write those decisions down in a way that can be correctly and
unambiguously reinterpreted during a future planning cycle for the
same query. For example, if we say that the planner chose a sequential
scan of table foo, what exactly does that mean? Table foo could appear
multiple times in the query, either in different subqueries or the
same one, and it could be a partitioned table with a different scan
method for each partition.

Let's start by talking about problem #1. I've found two subproblems in
this area so far. The first is that a Result node does not store the
relids of the scan or join that it replaces. Note that a Result note
whose job is to apply a one-time filter or a projection to some
subordinate node is not an issue here -- we can just look through the
Result node to whatever scan or join is beneath it. The concern here
is about the case where a scan or join is proven empty and entirely
replaced by a Result node that has "One-Time Filter: false". Patch
0001 adds that field, and patch 0002 teaches ExplainPreScanNodes about
it, which results in a number of regression test output changes that I
personally consider to be improvements -- with these patches, we
properly qualify some column references with a table alias as EXPLAIN
does in all other cases, as opposed to printing them as bare column
names with no alias. Patch 0003 checks that this is the only problem
of this type that is visible at the stage where we are constructing
join paths.

Still talking about problem #1, the second subproblem I've identified
is that during setrefs processing, we elide trivial SubqueryScan,
Append, and MergeAppend nodes from the final plan. So during planning
we might see, for example, that a certain join is between RTI 4 and
RTI 5 and it's, say, a hash join. But after setrefs processing, it may
appear that RTI was joined to, say, RTI 13, which might not even have
been part of the same subquery level. If we record that we want to see
RTI 4 joined to RTI 13 via a hash join, that will be completely
useless -- the join planning code will never see those two RTIs as
options to be joined to each other. What I've done in 0006 is made it
so that we keep a record of each node elided during setrefs
processing. This list of elided nodes is stored in the PlannedStmt
outside of the portion of the tree that actually gets executed, so
that code that is doing plan tree inspection can look at it but
execution doesn't get any slower (except possibly by having to copy a
slightly larger amount of data around when reading and writing
PlannedStmt objects, which seems like it should be negligible).

Now let's talk about problem #2. I believe that we do not actually
want to refer to what happened to RTI 4 and RTI 5 as I mooted in the
previous paragraph, but rather to refer to relations by some kind of
name. However, we can't use the names shown in the EXPLAIN output,
because those are not really present in the plan and are only assigned
on-the-fly by EXPLAIN; hence, they can't be known at plan time. Since
planning proceeds one subquery at a time, I think the right way to
approach this problem is to first name the subquery and then to name
the table within that subquery. Subqueries sort of have names right
now, at least some of them, but it's an odd system: a CTE subquery,
for example, has the name mentioned by the user, but other kinds of
subplans just get names like "InitPlan 3" or "SubPlan 2". The real
problem, though, is that those names are only assigned after we've
FINISHED planned the subquery. If we begin planning our very first
subquery, it might turn out to be InitPlan 1 or SubPlan 1, or if while
planning it we recurse into some further subquery then *that* subquery
might become InitPlan 1 or SubPlan 1 and OUR subquery might become
InitPlan 2 or SubPlan 2 (or higher, if we find more subqueries and
recurse into them too). Thus, being given some information about how
the user wants, say, SubPlan 2 to be planned is completely useless
because we won't know whether that is us until after we've done the
planning that the user is trying to influence.

To solve that problem, I decided to arrange for every subquery to have
a unique name that is assigned before we begin planning it. Patch 0004
does this. Then, 0005 records those names in the final plan. That's
enough that you can look at the scanrelid (or apprelids, etc.) of a
Plan node and relate that back to a named subquery and a particular
RTI within that subquery. There is still the problem of how to name
relations within a single subquery, since it's possible to reuse
aliases within the same subquery level (simple cases are rejected, but
there are at least two ways to bypass the error check, and they look
intentional, so we can't just block it). Then, references to a certain
alias name can be further multiplied by inheritance expansion. This is
all a bit hairy but I haven't really found any fundamental problems
here that keep you from deciding on a workable naming convention.

Hope you find this interesting. If you do, let me know what you think.

Thanks,

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v1-0005-Store-information-about-range-table-flattening-in.patch application/octet-stream 6.7 KB
v1-0004-Give-subplans-names-that-are-known-while-planning.patch application/octet-stream 151.4 KB
v1-0003-Assert-that-RTIs-of-joined-rels-are-discoverable-.patch application/octet-stream 7.1 KB
v1-0001-Keep-track-of-what-RTIs-a-Result-node-is-scanning.patch application/octet-stream 62.7 KB
v1-0002-Consider-a-Result-node-s-relids-in-ExplainPreScan.patch application/octet-stream 7.9 KB
v1-0006-Store-information-about-elided-nodes-in-the-final.patch application/octet-stream 9.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrei Lepikhov 2025-05-19 18:04:06 Re: Should we optimize the `ORDER BY random() LIMIT x` case?
Previous Message Tomas Vondra 2025-05-19 18:01:30 Re: strange perf regression with data checksums