Re: plan shape work

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alexandra Wang <alexandra(dot)wang(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "bruce(at)momjian(dot)us" <bruce(at)momjian(dot)us>, lepihov(at)gmail(dot)com
Subject: Re: plan shape work
Date: 2025-09-29 01:47:50
Message-ID: CAMbWs4-Q8nt7-UdKLVW-YP9O1_iqa0Q8fQtGsZvd1eF1hd8Pow@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 26, 2025 at 11:23 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Richard Guo <guofenglinux(at)gmail(dot)com> writes:
> > Looking at choose_plan_name(), IIUC, the nested loop is used to find
> > the next unused suffix number for a given name. I'm wondering why not
> > simply iterate through glob->subplanNames once, check the suffix
> > number for each name matching the given base name, determine the
> > current maximum suffix, and then use "max_suffix + 1" as the next
> > unused suffix. This approach requires only a single pass through the
> > list, and if there's a bug, the worst-case scenario would be a
> > duplicate name rather than an infinite loop. It seems to me that this
> > approach is both more efficient and less risky.

> "simply" is perhaps not the right adjective there. My guess is that
> this approach nets out to more code, more possibilities for bugs
> (especially in cases where one name is a prefix of another), and
> will be slower in typical cases with just a few subplan names.
>
> As an example of edge cases that your idea introduces, what happens
> if a user-written subquery name is "expr_999999999999999999999999"
> and then we need to generate a unique name based on "expr"? Now
> we have an integer-overflow situation to worry about, with possibly
> platform-dependent results.

I'd argue that this hypothetical edge case can be resolved with a bit
of canonicalization in how subplan names are represented internally.
I think the issue you mentioned arises because there is no clearly
distinction between the base name and the numeric suffix. I haven't
spent much time thinking about it, but an off-the-cuff idea is to
require that all subplan names in glob->subplanNames end with a suffix
of the form "_<number>". (If no numeric suffix is required, we can
use the suffix "_0".) With this convention, we can simply split
on the last underscore: everything before it is the base name, and
everything after is the numeric suffix.

The user-written subquery name "expr_999999999999999999999999" would
be internally represented as "expr_999999999999999999999999_0". Then,
when we need to generate a unique name based on "expr", it won't match
with the base name of that subquery name.

With this canonicalization in place, my proposed approach is simply a
matter of applying strrchr(name, '_') and tracking the maximum suffix
number in a single pass over glob->subplanNames. I think this can be
handled with straightforward, basic C code. It seems to me that
this could also eliminate the need for the additional loop under the
"if (!always_number)" branch in choose_plan_name().

By replacing a pass over the subplanNames list plus a nested loop with
a single pass, I doubt that this would be slower in typical cases.

- Richard

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2025-09-29 02:09:07 Re: Eager aggregation, take 3
Previous Message Chao Li 2025-09-29 01:46:05 Re: Fix locking issue with fixed-size stats template in injection_points