| From: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
|---|---|
| To: | assam258(at)gmail(dot)com |
| Cc: | jian(dot)universality(at)gmail(dot)com, zsolt(dot)parragi(at)percona(dot)com, sjjang112233(at)gmail(dot)com, vik(at)postgresfriends(dot)org, er(at)xs4all(dot)nl, jacob(dot)champion(at)enterprisedb(dot)com, david(dot)g(dot)johnston(at)gmail(dot)com, peter(at)eisentraut(dot)org, li(dot)evan(dot)chao(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Row pattern recognition |
| Date: | 2026-06-10 14:19:02 |
| Message-ID: | 20260610.231902.1271202110858446443.ishii@postgresql.org |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Henson,
> Hi Tatsuo, Jian,
>
> While doing a self-review pass over the incremental fixes on top of v47 I
> ran into two issues where I'd rather agree on an approach with you before
> I pick one. One of them is a regression I introduced myself in the DEFINE
> memory-leak fix; the other is an original design point from v47. There is
> also a third bug which I plan to handle together with the second one, since
> it can be affected by that change -- I describe it at the end.
>
> I have verified both of the issues below on an assert-enabled build (and a
> non-assert build where relevant).
>
>
> == 1. DEFINE evaluation reuses the per-output-tuple context
> (use-after-free) ==
>
> nocfbot-0039 (the DEFINE memory-leak fix) added a ResetExprContext() in
> update_reduced_frame, but it resets the wrong context.
>
> ps_ExprContext is the per-output-tuple context that ExecWindowAgg resets
> once per output row. update_reduced_frame now resets it once per NFA row,
> while the output row is still being formed -- so a pass-by-ref window
> function result already datum-copied into that per-tuple memory (when
> numfuncs > 1) is freed before ExecProject reads it.
>
> Minimal trigger -- a pass-by-ref window function plus a second one over an
> RPR window:
>
> SELECT lag(company) OVER w, count(*) OVER w FROM stock
> WINDOW w AS (PARTITION BY company ORDER BY tdate
> ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> AFTER MATCH SKIP PAST LAST ROW INITIAL
> PATTERN (START UP+)
> DEFINE START AS TRUE, UP AS price > PREV(price));
>
> On a CLOBBER_FREED_MEMORY build the lag column comes out as 0x7F garbage;
> in production it is garbage or a crash. (An aggregate is not required --
> lag + first_value hits the same reset via the frame-access path.)
>
> Neither v47 nor the patch is the answer on its own: v47 had no reset here,
> so no use-after-free, but the DEFINE scratch accumulated over the whole
> forward scan (the leak nocfbot-0039 fixed); nocfbot-0039 added the per-row
> reset but on the shared per-output-tuple context. We do want a per-row
> reset -- just not on that context.
>
> So I think this needs a dedicated ExprContext for DEFINE evaluation, reset
> once per NFA row: it keeps the memory bounded without touching the
> per-output-tuple results.
>
> Question: does a dedicated DEFINE ExprContext look right to you?
Can we use winstate->tmpcontext instead?
>
> == 2. PREV/NEXT/FIRST/LAST placeholders collide with user functions ==
>
> The nav operations are polymorphic pg_catalog functions (anyelement, OIDs
> 8126-8133) recognized by funcid in parse_func.c, which collides with
> same-name user functions.
>
> Outside DEFINE, a same-name function masks or clashes with the placeholder:
> with public.last(anyelement), SELECT last(123) fails "cannot use last
> outside a DEFINE clause"; with public.next(numeric), SELECT next(10) fails
> "function next(integer) is not unique"; and even with no user function,
> last(123) errors instead of "function last(integer) does not exist".
>
> Inside DEFINE, a same-name function with an exact-type match beats the
> anyelement placeholder, so PREV(price) silently becomes a plain FuncExpr
> instead of an RPRNavExpr -- a wrong match result with no error (reproduced
> for numeric, text and int). And ruleutils deparses a bare PREV(, so
> reparsing a view under a search_path with public.prev rebinds it (pg_dump
> is safe via search_path = '').
>
> This is original v47 design, not a regression. Per the standard,
> PREV/NEXT/FIRST/LAST are navigation operations with dedicated syntax, not
> general-namespace functions -- the collision comes from mapping them onto
> catalog functions plus search-path resolution.
>
> I haven't found a clean approach yet. Inside DEFINE these names have to be
> the navigation operation (per the standard), yet outside DEFINE they
> shouldn't shadow or break same-name user functions the way the catalog
> placeholders do -- and since the deparse output is unqualified (a bare
> PREV(...)), whatever we choose also has to round-trip cleanly. I'm not
> sure how best to reconcile those.
>
> My rough leaning is to not add catalog functions for these at all: leave
> resolution outside DEFINE exactly as it is today, and only inside DEFINE
> adjust the function-resolution path itself to recognize the navigation
> operations. But that is still quite abstract.
>
> Question: how would you approach this?
I need more time think about this.
Regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2026-06-10 14:31:48 | Re: [PATCH] Fix compiler warnings by using designated initializers |
| Previous Message | Andres Freund | 2026-06-10 14:12:26 | Re: Heads Up: cirrus-ci is shutting down June 1st |