Re: Row pattern recognition

From: Henson Choi <assam258(at)gmail(dot)com>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: vik(at)postgresfriends(dot)org, er(at)xs4all(dot)nl, jacob(dot)champion(at)enterprisedb(dot)com, david(dot)g(dot)johnston(at)gmail(dot)com, peter(at)eisentraut(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Row pattern recognition
Date: 2026-02-27 11:30:57
Message-ID: CAAAe_zDaaOH4kaNfeN3Gk9u8VyTpUm-UgFebKc1KKB8o+PSjPg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tatsuo,

Currently we do not account the cost of RPR while planning. Attached
> is the first attempt to try to estimate the RPR costs. The cost model
> is very simple:
>
> expression cost per PATTERN variable * number of input tuples
>
> Any idea to make this estimation better?
>

> foreach(lc, windowFuncs)
> {
> ...
> + /* also add DEFINE clause expressions' cost to per-input-row costs */
> + if (winclause->rpPattern)
> + {
> + List *pattern_vars; /* list of pattern variable names */
> + ListCell *lc2;
> +
> + pattern_vars = collectPatternVariables(winclause->rpPattern);
> +
> + /* iterate according to the pattern variable */
> + foreach(lc2, pattern_vars)
> + {
> + char *ptname = strVal((char *) lfirst(lc2));

`collectPatternVariables` returns a list of String nodes (via
`makeString()`), so `strVal(lfirst(lc2))` is the idiomatic form.
The `(char *)` cast is misleading.

There is also a correctness issue: DEFINE expressions belong to the
window clause, not to individual window functions, so their cost
should not be multiplied by the number of window functions sharing
the clause.

The fix is to compute the DEFINE cost once outside the loop and add
it to `startup_cost` and `total_cost` directly, after the
`foreach(lc, windowFuncs)` block.

Regarding the cost model: the NFA executor evaluates all DEFINE
expressions once per row into a shared `nfaVarMatched[]` array that
all active contexts read from, and contexts advance strictly forward
so no prior row is ever re-evaluated. The one-evaluation-per-row
cost model is therefore accurate. NFA-aware cost
modeling could be built on top of this foundation in a separate patch
down the road, once the NFA implementation has matured.

For now, the DEFINE expression costs themselves already serve as a
natural penalty — a window clause with RPR will consistently appear
more expensive than a comparable plain window function. This gives
the surrounding plan a reasonable cost signal for decisions such as
join ordering and materialization of RPR subqueries. So the current
approach is reasonable as a first step.

Other than that, the approach looks good to me. Would it be okay if
I revise the patch along those lines?

Best regards,
Henson

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nisha Moond 2026-02-27 11:31:14 Re: Skipping schema changes in publication
Previous Message Andrey Borodin 2026-02-27 11:21:15 Re: amcheck: add index-all-keys-match verification for B-Tree