Re: Proposed refactoring of planner header files

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Proposed refactoring of planner header files
Date: 2019-01-28 20:32:29
Message-ID: 20190128203229.yreabsq6kinxdnh5@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-01-28 15:17:19 -0500, Tom Lane wrote:
> In <6044(dot)1548524131(at)sss(dot)pgh(dot)pa(dot)us> I worried about how much planner
> stuff that patch might end up dragging into files that contain
> planner support functions, and suggested that we could refactor the
> planner's header files to reduce the inclusion footprint. Attached
> are some proposed patches to improve the situation. The work isn't
> fully done yet, but I was hoping to get buy-in on this approach
> before going further.

> The basic idea here is to create a new header file, which I chose
> to call optimizer/optimizer.h, and put into it just the stuff that
> "outside" callers of the planner might need. For this purpose
> "outside" can be approximated as "doesn't really need to know what
> is in relation.h", ie Paths and related data structures. I expect
> that planner support functions will mostly be working with parsetree
> data structures for their functions, so they should fit that
> restriction. In some cases they need to be able to pass a PlannerInfo
> pointer through to some planner function they want to call, but they
> can treat the pointer as an opaque type. This worked out pretty well,
> as I was able to eliminate uses of all other optimizer/ headers (and,
> therefore, relation.h) from all but four or five files outside
> backend/optimizer/. The holdouts are mostly places that are pretty
> much in bed with the planner anyway, such as statistics/dependencies.c.

Without having studied the patch in any detail, that seems a worthwhile
goal to me.

> Since I was intentionally trying to limit what optimizer.h pulls in,
> and in particular not let it include relation.h, I needed an opaque
> typedef for PlannerInfo. On the other hand relation.h also needs to
> typedef that. I fixed that with a method that we've not used in our
> code AFAIK, but is really common in system headers: there's a #define
> symbol to remember whether we've created the typedef, and including
> both headers in either order will work fine.

Ugh, isn't it nicer to just use the underlying struct type instead of
that? Or alternatively we could expand our compiler demands to require
that redundant typedefs are allowed - I'm not sure there's any animal
left that doesn't support that (rather than triggering an error it via
an intentionally set flag).

> I'm really unhappy that force_parallel_mode and
> parallel_leader_participation are being treated as planner GUCs.
> They are not that, IMO, because they also affect the behavior of
> the executor, cf HandleParallelMessage, ExecGather, ExecGatherMerge.
> This is somewhere between ill-considered and outright broken: what
> happens if the values change between planning and execution? I think
> we probably need to fix things so that those variables do not need to
> be looked at by the executor, carrying them forward in the plan
> tree if necessary. Then they'd not need to be exposed by
> optimizer.h, and indeed I think the mentioned modules wouldn't
> need any optimizer inclusions anymore.

Stylistically I agree with that (and it's what I ended up doing for JIT
compilation as well), but do you see concrete harm here? I don't think
it's super likely that anybody changes these on the fly and expects
immediate effects?

> I would have exposed estimate_rel_size, which is needed by
> access/hash/hash.c, except that it requires Relation and
> BlockNumber typedefs. The incremental value from keeping
> hash.c from using plancat.h probably isn't worth widening
> optimizer.h's #include footprint further. Also, I wonder
> whether that whole area needs a rethink for pluggable storage.

The pluggable storage patchset currently makes estimate_rel_size() a
callback for table-like objects. While using BlockNumber isn't all that
pretty (it's one of a few tableam functions dealing with blocks, the
others being samplescan and bitmapscan). I've been wondering whether we
should change the API to return the size in a non-block oriented way,
but given the current costing I wasn't sure that's worth much.

There's an XXX in the changed code wondering whether we should make
estimate_rel_size() also call a callback for indexes, that'd e.g. allow
better per-index logic (c.f. the function's comment about gist).

What kind of rejiggering were you thinking of re pluggable storage?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-01-28 20:33:10 Re: Proposed refactoring of planner header files
Previous Message Hugh Ranalli 2019-01-28 20:26:12 Re: BUG #15548: Unaccent does not remove combining diacritical characters