Re: pg_plan_advice

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_plan_advice
Date: 2025-12-10 14:54:13
Message-ID: CA+TgmoY0J9==y93stO7QaAU+TyA-zm5wqsc4GWUa0hgzPAi_sw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 10, 2025 at 6:20 AM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> These are just high-level comments after browsing the patches and
> reading some bits like pgpa_identifier to get myself familiarized with
> the project. I like that the key concept here is plan stability
> rather than plan control, because that framing makes it easier to
> treat this as infrastructure instead of policy.

Thanks, I agree. I'm sure people will use this for plan control, but
if you start with that, then it's really unclear what things you
should allow to be controlled and what things not. Defining the focus
as plan stability makes round-trip safety a priority and the scope of
what you can request is what the planner could have generated had the
costing come out just so. There's still some definitional questions at
the margin, but IMHO it's much less fuzzy.

> On the relation identifier system: IMHO this part doesn't seem as
> opinionated as the advice mini-language. The requirements pretty much
> dictate the design -- you need alias names and occurrence counters to
> handle self-joins, partition fields for partitioned tables, and a
> string representation to survive dump/restore. There doesn't seem to
> be much flexibility in that.

Right. There's some flexibility. For instance, you could handle
partitions using occurrence numbers, which would actually save a bunch
of code, but that seems obviously worse in terms of user experience.
Also, you could if you wanted key it off of the name of the table
rather than the relation alias used for the table. I think that's also
worse but possibly it's debatable. You could change the order of the
pieces in the representation; e.g. maybe plan_name should come first
rather than last; or you could change the separator characters. But,
honestly, none of that strikes me as sufficient grounds to want
multiple implementations. If the choices I've made don't seem good to
other people, then we should just change them and hopefully find
something everybody can live with. It's a bit like the way that
extension SQL scripts use "--" as a separator: maybe not everybody
agrees that this is the absolutely most elegant choice, but nobody's
proposing a a second version of the extension mechanism just to do
something different.

> Given that, it seems more practical to put this in core from the
> start. Extensions that might want to build plan-advice-like
> functionality shouldn’t have to clone this logic and wait another
> release for something that’s already well-defined and deterministic.
> The mini-language is opinionated and belongs in contrib, but the
> identifier infrastructure just solves a fundamental problem cleanly.

It's not quite as easy to make a sharp distinction between these
things as someone might hope. Note that the lexer and parser handle
the whole mini-language, which includes parsing the relation
identifiers. That doesn't of course mean that the code to *generate*
relation identifiers couldn't be in core, and I actually had it that
way at one point, but it's not very much code and I wasn't too
impressed with how that turned out. It seemed to couple the core code
to the extension more tightly than necessary for not much real
benefit.

But that's not to say I disagree with you categorically. Suppose we
decided (and I'm not saying we should) to start showing relation
identifiers in EXPLAIN output instead of identifying things in EXPLAIN
output as we do today. Maybe we even decide to show elided subqueries
and similar as first-class parts of the EXPLAIN output, also using
relation identifier syntax. That would be a pretty significant change,
and would destabilize a WHOLE LOT of regression test outputs, but then
relation identifiers become a first-class PostgreSQL concept that
everyone who looks at EXPLAIN output will encounter and, probably,
come to understand. Then obviously the relation identifier generation
code needs to be in core, and that makes total sense because we're
actually using it for something, and arguably we've made life easier
for everyone who wants to use pg_plan_advice in the future because
they're already familiar with the identifier convention. The downside
is everyone has to get used to the new EXPLAIN output even if they
don't care about pg_plan_advice or hate it with a fiery passion.

So my point here is that there are things we can decide to do to make
some or all of this "core," but IMHO it's not just as simple as saying
"this is in, that's out". It's more about deciding what the end state
ought to look like, and how integrated this stuff ought to be into the
fabric of PostgreSQL. I started with the minimal level of integration:
little pieces of core infrastructure, all used by a giant extension.
Now we need to either decide that's where we want to settle, or decide
to push to some greater or lesser degree toward more integration.

> On the infrastructure patches (0001-0005): these look sensible. The
> range table flattening info, elided node tracking, and append node
> consolidation preserve information that's currently lost -- there's
> some additional overhead to track this, but it's fixed per-relation
> per-subquery, which seems reasonable. The path generation hooks
> (0005) are a clear improvement: moving from global enable_* GUCs to
> per-RelOptInfo pgs_mask gives extensions the granularity they need for
> relation-specific and join-specific decisions. Yes, you need C code to
> use them, but you'd need to write C code to do something of value in
> this area anyway, and the hooks give you control that GUCs can't
> provide.
>
> Overall, I'm supportive of getting these committed once they're ready.
> contrib/pg_plan_advice is a compelling proof-of-concept for why these
> hooks are needed.

Great. I don't think there's anything terribly controversial in
0001-0004. I think the comments and so on might need improving and
there could be little mini-bugs or whatever, but basically I think
they work and I don't anticipate any major problems. However, I'd want
at least one other person to do a detailed review before committing
anything. 0005 might be a little more controversial. There's some
design choices to dislike (though I believe I've made them for good
reason) and there's a question of whether it's as complete as we want.
It might be fine to commit it the way it is and just adjust it later
if we find that something ought to be different, but it's also
possible that we should think harder about some of the choices or hold
off for a bit while other parts of this effort move forward. I'm happy
to hear opinions on the best strategy here.

> I'll try to post more specific comments once I've read this some more.

Thanks for the review so far, and that sounds great!

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2025-12-10 15:04:13 Re: Small bugs regarding resowner handling in aio.c, catcache.c
Previous Message Álvaro Herrera 2025-12-10 14:33:50 Re: Solaris versus our NLS files