| From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Alexandra Wang <alexandra(dot)wang(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "bruce(at)momjian(dot)us" <bruce(at)momjian(dot)us>, lepihov(at)gmail(dot)com |
| Subject: | Re: plan shape work |
| Date: | 2025-09-29 01:47:50 |
| Message-ID: | CAMbWs4-Q8nt7-UdKLVW-YP9O1_iqa0Q8fQtGsZvd1eF1hd8Pow@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, Sep 26, 2025 at 11:23 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Richard Guo <guofenglinux(at)gmail(dot)com> writes:
> > Looking at choose_plan_name(), IIUC, the nested loop is used to find
> > the next unused suffix number for a given name. I'm wondering why not
> > simply iterate through glob->subplanNames once, check the suffix
> > number for each name matching the given base name, determine the
> > current maximum suffix, and then use "max_suffix + 1" as the next
> > unused suffix. This approach requires only a single pass through the
> > list, and if there's a bug, the worst-case scenario would be a
> > duplicate name rather than an infinite loop. It seems to me that this
> > approach is both more efficient and less risky.
> "simply" is perhaps not the right adjective there. My guess is that
> this approach nets out to more code, more possibilities for bugs
> (especially in cases where one name is a prefix of another), and
> will be slower in typical cases with just a few subplan names.
>
> As an example of edge cases that your idea introduces, what happens
> if a user-written subquery name is "expr_999999999999999999999999"
> and then we need to generate a unique name based on "expr"? Now
> we have an integer-overflow situation to worry about, with possibly
> platform-dependent results.
I'd argue that this hypothetical edge case can be resolved with a bit
of canonicalization in how subplan names are represented internally.
I think the issue you mentioned arises because there is no clearly
distinction between the base name and the numeric suffix. I haven't
spent much time thinking about it, but an off-the-cuff idea is to
require that all subplan names in glob->subplanNames end with a suffix
of the form "_<number>". (If no numeric suffix is required, we can
use the suffix "_0".) With this convention, we can simply split
on the last underscore: everything before it is the base name, and
everything after is the numeric suffix.
The user-written subquery name "expr_999999999999999999999999" would
be internally represented as "expr_999999999999999999999999_0". Then,
when we need to generate a unique name based on "expr", it won't match
with the base name of that subquery name.
With this canonicalization in place, my proposed approach is simply a
matter of applying strrchr(name, '_') and tracking the maximum suffix
number in a single pass over glob->subplanNames. I think this can be
handled with straightforward, basic C code. It seems to me that
this could also eliminate the need for the additional loop under the
"if (!always_number)" branch in choose_plan_name().
By replacing a pass over the subplanNames list plus a nested loop with
a single pass, I doubt that this would be slower in typical cases.
- Richard
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Richard Guo | 2025-09-29 02:09:07 | Re: Eager aggregation, take 3 |
| Previous Message | Chao Li | 2025-09-29 01:46:05 | Re: Fix locking issue with fixed-size stats template in injection_points |