| From: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
|---|---|
| To: | assam258(at)gmail(dot)com |
| Cc: | jian(dot)universality(at)gmail(dot)com, zsolt(dot)parragi(at)percona(dot)com, sjjang112233(at)gmail(dot)com, vik(at)postgresfriends(dot)org, er(at)xs4all(dot)nl, jacob(dot)champion(at)enterprisedb(dot)com, david(dot)g(dot)johnston(at)gmail(dot)com, peter(at)eisentraut(dot)org, li(dot)evan(dot)chao(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Row pattern recognition |
| Date: | 2026-06-12 03:40:50 |
| Message-ID: | 20260612.124050.1286931795409220006.ishii@postgresql.org |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Henson,
> Hi Tatsuo,
>
> == 1. DEFINE evaluation use-after-free (to Tatsuo) ==
>
>>> nocfbot-0039 (the DEFINE memory-leak fix) added a ResetExprContext() in
>>> update_reduced_frame, but it resets the wrong context.
>>>
>>> ps_ExprContext is the per-output-tuple context that ExecWindowAgg resets
>>> once per output row. update_reduced_frame now resets it once per NFA
> row,
>>> while the output row is still being formed -- so a pass-by-ref window
>>> function result already datum-copied into that per-tuple memory (when
>>> numfuncs > 1) is freed before ExecProject reads it.
>>>
>>> Neither v47 nor the patch is the answer on its own: v47 had no reset
> here,
>>> so no use-after-free, but the DEFINE scratch accumulated over the whole
>>> forward scan (the leak nocfbot-0039 fixed); nocfbot-0039 added the
> per-row
>>> reset but on the shared per-output-tuple context. We do want a per-row
>>> reset -- just not on that context.
>>>
>>> So I think this needs a dedicated ExprContext for DEFINE evaluation,
> reset
>>> once per NFA row: it keeps the memory bounded without touching the
>>> per-output-tuple results.
>>>
>>> Question: does a dedicated DEFINE ExprContext look right to you?
>>
>> Can we use winstate->tmpcontext instead?
>
> I don't think we can. tmpcontext is idle where update_reduced_frame
> would reset it, but not *during* DEFINE evaluation: a DEFINE with NEXT()
> re-enters the spooler from inside its ExecEvalExpr (ExecEvalRPRNavSet
> -> window_gettupleslot -> spool_tuples), and spool_tuples resets
> winstate->tmpcontext (via ExecQualAndReset on partEqfunction) for every
> input row it pulls under a PARTITION BY. ExecEvalRPRNavRestore parks a
> pass-by-ref nav result in that per-tuple memory to survive until the
> next reset. To give one example,
>
> -- the cast forces a pass-by-ref result; a by-value bigint is safe
> DEFINE b AS PREV(price::numeric) < NEXT(price::numeric)
>
> frees PREV's result when NEXT spools the next row, before numeric_lt()
> reads it -- the same use-after-free as nocfbot-0039, just on a different
> context. It is the normal forward-scan path (the NFA runs ahead of the
> spool), and after a tuplestore spill even NEXT-free DEFINEs hit it.
>
> More generally, tmpcontext fits its existing users because each one sets
> its slots, evaluates, and resets within a shallow, straight-line region;
> it is best kept to that limited, low-depth usage. DEFINE evaluation
> re-enters the spooler mid-expression, so it cannot honor that contract.
>
> So I think the dedicated DEFINE ExprContext is the right call: a
> once-per-NFA-row reset is a third cadence, distinct from tmpcontext
> (per-input-tuple) and ps_ExprContext (per-output-tuple) -- the same
> one-context-per-cadence pattern nodeWindowAgg and nodeAgg already use.
Thanks for the explanation. Got it. We need the third ExprContext.
Regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Chao Li | 2026-06-12 03:48:39 | postgres_fdw: fix cumulative stats after imported foreign-table stats |
| Previous Message | Ewan Young | 2026-06-12 03:36:03 | Fast-path FK checks reject valid inserts for domain-typed FK columns |