| From: | Henson Choi <assam258(at)gmail(dot)com> |
|---|---|
| To: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
| Cc: | zsolt(dot)parragi(at)percona(dot)com, sjjang112233(at)gmail(dot)com, vik(at)postgresfriends(dot)org, er(at)xs4all(dot)nl, jacob(dot)champion(at)enterprisedb(dot)com, david(dot)g(dot)johnston(at)gmail(dot)com, peter(at)eisentraut(dot)org, li(dot)evan(dot)chao(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Row pattern recognition |
| Date: | 2026-05-05 04:26:56 |
| Message-ID: | CAAAe_zCwY4Lt_OS44DMuqua-szeDVVTgJDeH=c-d5Qco5v2sOQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Tatsuo,
Thanks for the close look. Inline below, then a sketch of what I
plan to put in v48 so you can pick what to pull first.
v47 starts to check whether range variable qualified expressions are
> used in DEFINE clause. If used, raise an error. This is good because
> we don't support the syntax yet (pattern variable range var), or it's
> prohibited (from clause range var). However, the error message may not
> be appropreate for the case when complex data type is involved.
>
> CREATE TEMP TABLE item (name TEXT, amount INT);
> CREATE TABLE
> CREATE TEMP TABLE sales(items item);
> CREATE TABLE
> SELECT (items).name, (items).amount, count(*) OVER w
> FROM sales WINDOW w AS (
> ORDER BY (items).name
> ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> AFTER MATCH SKIP PAST LAST ROW
> PATTERN (A+)
> DEFINE A AS (A.items).amount > 10
> );
> ERROR: pattern variable qualified column reference "a.items" is not
> supported in DEFINE clause
> LINE 7: DEFINE A AS (A.items).amount > 10
> ^
> If I change the DEFINE clause to:
>
> DEFINE A AS (sales.items).amount > 10
>
> I get:
>
> ERROR: range variable qualified column reference "sales.items" is not
> allowed in DEFINE clause
> LINE 8: DEFINE A AS (sales.items).amount > 10
> ^
>
> In both cases, "a.items" or "sales.items" in the error messages are
> not column names, therefore the wording "column reference" in the
> error messages are not appropriate.
>
Agreed. Even for the simple non-composite cases the noun is fuzzy
once A_Indirection enters the picture. "expression" reads naturally
in both cases.
> In order to fix the issue, I think we need to add code to understand
> range var qualification form "A.item" or "sales.items" in complex data
> types case. But is it worth the trouble? The reword is just nicer
> error messages. If we support MEASURES, the code is useful. But so far
> we have decided to not support it in the first cut of RPR.
>
Not now. Teaching the pre-check about the surrounding A_Indirection
duplicates work that MEASURES will need anyway, and the only visible
gain before MEASURES is a more accurate echo of the source text.
Instead I'll mark the limitation in three places so future MEASURES
work can't miss it: a block comment above the pre-check that names
the limitation and the revisit point, an XXX cross-reference in
parse_rpr.c pointing back to the pre-check, and composite-type cases
in rpr_base whose expected output quotes only the ColumnRef portion
-- so when indirection-aware quoting lands, those outputs churn as
a tripwire.
> Maybe we should just change the error messages something like below
> for now?
>
> ERROR: pattern variable qualified expression "a.items" is not supported
> in DEFINE clause
> ERROR: range variable qualified expression "sales.items" is not allowed
> in DEFINE clause
Yes -- planned for v48. While I'm in that path I'll also clean up
two adjacent issues:
- the pre-check is binary, so an unknown qualifier (typo,
undefined name) is misreported as a range-variable reference
with SYNTAX_ERROR instead of falling through to the standard
"column does not exist" / "missing FROM-clause entry"
diagnostic;
- parse_rpr.c and the SELECT docs claim DEFINE-name "collection"
and "filtering during planning"; the actual behavior is
validate-and-reject at parse analysis.
------------------------------------------------------------------
v48 follow-up plan
------------------------------------------------------------------
The items below are independent, so they can ship as separate
patches or as a single batched posting -- whichever you prefer.
Order is just for narrative; nothing depends on anything else.
[A] Sizable refactor: collapse the four DEFINE walkers across
parser/planner/executor into a single phase-tagged
traversal. On that base, reject volatile / NextValueExpr
in DEFINE (the NFA may re-evaluate predicates during
backtracking; STABLE / IMMUTABLE remain accepted), and
bundle a STABLE/IMMUTABLE baseline test as a guard against
accidental over-rejection.
[B] Make the empty-match path observable in tests: replace the
stale XXX comments in rpr_nfa with the actual behavior, and
add EXPLAIN ANALYZE coverage in rpr_explain that surfaces
"NFA: N matched (len 0/0/0.0)" so the NFA-found-but-window-
empty case is regression-visible.
[C] Trim the per-advance NFA visited-bitmap reset to a high-water
mark range instead of the full bitmap. Tradeoff: two int16
comparisons added per visit, paying off for larger NFAs but
added overhead for single-word bitmaps; semantics unchanged.
I'll leave the decision to apply to your judgment.
[D] DEFINE qualifier diagnostic: tri-classify (pattern var /
range var / fall-through), reword to "expression", add
unknown-qualifier and composite-type tests, and sync the
adjacent stale comments and SELECT doc.
If any of [A]..[D] looks misjudged or you'd prefer a different
slicing, I'll reshape before posting.
Best,
Henson
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Amit Kapila | 2026-05-05 04:57:47 | Re: Include schema-qualified names in publication error messages. |
| Previous Message | shveta malik | 2026-05-05 04:06:55 | Re: Proposal: Conflict log history table for Logical Replication |