Proposed refactoring of planner header files

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Proposed refactoring of planner header files
Date: 2019-01-28 20:17:19
Message-ID: 11460.1548706639@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

In <6044(dot)1548524131(at)sss(dot)pgh(dot)pa(dot)us> I worried about how much planner
stuff that patch might end up dragging into files that contain
planner support functions, and suggested that we could refactor the
planner's header files to reduce the inclusion footprint. Attached
are some proposed patches to improve the situation. The work isn't
fully done yet, but I was hoping to get buy-in on this approach
before going further.

The basic idea here is to create a new header file, which I chose
to call optimizer/optimizer.h, and put into it just the stuff that
"outside" callers of the planner might need. For this purpose
"outside" can be approximated as "doesn't really need to know what
is in relation.h", ie Paths and related data structures. I expect
that planner support functions will mostly be working with parsetree
data structures for their functions, so they should fit that
restriction. In some cases they need to be able to pass a PlannerInfo
pointer through to some planner function they want to call, but they
can treat the pointer as an opaque type. This worked out pretty well,
as I was able to eliminate uses of all other optimizer/ headers (and,
therefore, relation.h) from all but four or five files outside
backend/optimizer/. The holdouts are mostly places that are pretty
much in bed with the planner anyway, such as statistics/dependencies.c.

I did not attempt to narrow the API used by FDWs, so file_fdw and
postgres_fdw are two of the main places that still need other
optimizer/ headers. It might be useful to do a similar exercise
focusing on the API seen by FDWs, but that's for another time.

Also, I didn't work on tightening selfuncs.c's dependencies.
While I don't have a big problem with considering selfuncs.c to be
in bed with the planner, that's risky in that whatever dependencies
selfuncs.c has may well apply to extensions' selectivity estimators too.
What I'm thinking about doing there is trying to split selfuncs.c into
two parts, one being infrastructure that can be tightly tied to the
core planner (and, likely, get moved under backend/optimizer/) and the
other being estimators that use a limited API and can serve as models
for extension code. But I haven't tried to do that yet, and would like
to get the attached committed first.

There are three patches attached:

0001 takes some very trivial support functions out of clauses.c and
puts them into the generic node support headers (nodeFuncs.h and
makefuncs.h) where they arguably should have been all along. I also
took the opportunity to rename and_clause() and friends into
is_andclause() etc, to make it clearer that they are node-type-testing
functions not node-construction functions, and improved the style a
bit by using "static inline" where practical.

0002 adjusts the API of flatten_join_alias_vars() and some subsidiary
functions so that they take a Query not a PlannerInfo to define the
context they're using for Var transformation. This makes it greatly
less ugly for parse_agg.c to call that function. Without this change
it's impossible for parse_agg.c to be decoupled from relation.h.
It likely also saves some tiny number of cycles, by removing one level
of pointer indirection within that processing.

0003 then creates optimizer.h, moves relevant declarations there, and
adjusts #includes as needed.

Since I was intentionally trying to limit what optimizer.h pulls in,
and in particular not let it include relation.h, I needed an opaque
typedef for PlannerInfo. On the other hand relation.h also needs to
typedef that. I fixed that with a method that we've not used in our
code AFAIK, but is really common in system headers: there's a #define
symbol to remember whether we've created the typedef, and including
both headers in either order will work fine.

optimizer.h exposes a few of the planner's GUCs, but just the basic
cost parameters, which are likely to be useful to planner support
functions. Another choice is to expose all of them, but I'm not
sure that's a great idea --- see gripe below for an example of why
that can encourage broken coding.

I intentionally limited 0003 to just do header refactoring, not code
motion, so there are some other follow-on tasks I'm thinking about.
Notably:

I'm really unhappy that force_parallel_mode and
parallel_leader_participation are being treated as planner GUCs.
They are not that, IMO, because they also affect the behavior of
the executor, cf HandleParallelMessage, ExecGather, ExecGatherMerge.
This is somewhere between ill-considered and outright broken: what
happens if the values change between planning and execution? I think
we probably need to fix things so that those variables do not need to
be looked at by the executor, carrying them forward in the plan
tree if necessary. Then they'd not need to be exposed by
optimizer.h, and indeed I think the mentioned modules wouldn't
need any optimizer inclusions anymore.

Most everything that's being exposed from tlist.c and var.c could be
argued to be generic parsetree-manipulation support that should be
somewhere else, perhaps backend/nodes/ or backend/parser/. If we
moved those functions, I think we could get to a place where
backend/parser/ doesn't use any optimizer headers at all, which seems
like a good idea from a modularity standpoint.

Likewise, maybe expand_function_arguments() should be elsewhere.

It seems like evaluate_expr() probably doesn't belong in the planner
at all; it looks like an executor function doesn't it?

It seems possible that cost_qual_eval() should be exposed by
optimizer.h, but to do so we'd need to expose struct QualCost,
which is a creature of relation.h ATM. Seeing that typedef Cost
is already in nodes.h, maybe it wouldn't be too awful to put
QualCost there too?

I would have exposed estimate_rel_size, which is needed by
access/hash/hash.c, except that it requires Relation and
BlockNumber typedefs. The incremental value from keeping
hash.c from using plancat.h probably isn't worth widening
optimizer.h's #include footprint further. Also, I wonder
whether that whole area needs a rethink for pluggable storage.

Comments?

regards, tom lane

Attachment Content-Type Size
0001-move-trivial-node-support.patch text/x-diff 32.8 KB
0002-flatten-join-alias-with-query.patch text/x-diff 10.5 KB
0003-refactor-optimizer-headers.patch text/x-diff 60.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Corey Huinker 2019-01-28 20:24:36 Re: Alternative to \copy in psql modelled after \g
Previous Message Robert Haas 2019-01-28 19:47:33 Re: proposal: new polymorphic types - commontype and commontypearray