Re: Row pattern recognition

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: Henson Choi <assam258(at)gmail(dot)com>
Cc: zsolt(dot)parragi(at)percona(dot)com, sjjang112233(at)gmail(dot)com, vik(at)postgresfriends(dot)org, er(at)xs4all(dot)nl, jacob(dot)champion(at)enterprisedb(dot)com, david(dot)g(dot)johnston(at)gmail(dot)com, peter(at)eisentraut(dot)org, li(dot)evan(dot)chao(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org, Tatsuo Ishii <ishii(at)postgresql(dot)org>
Subject: Re: Row pattern recognition
Date: 2026-06-09 08:57:20
Message-ID: CACJufxFnwdQSApt2vWwYCd0gtf+JjFDxT2hbxHi=+dhFJc+-1g@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi.

The below in advanced.sgml actually seems better suited for
explain.sgml (which currently has no changes).
At the very least, explain.sgml should mention it.

<para>
When examining query plans for Row Pattern Recognition with
<command>EXPLAIN</command>, the pattern output may include special
markers that indicate optimization opportunities. A double quote
<literal>"</literal> marks where pattern absorption can occur,
and a single quote <literal>'</literal> marks absorbable elements
within a branch. For example, <literal>a+"</literal> indicates that
repeated matches of <literal>a</literal> can be absorbed, while
<literal>(a' b')+"</literal> shows that both <literal>a</literal>
and <literal>b</literal> within the group are absorbable.
These markers are primarily useful for understanding internal
optimization behavior.
</para>

<programlisting>
......
PATTERN (LOWPRICE UP+ DOWN+)
DEFINE
LOWPRICE AS price &lt;= 100,
UP AS price &gt; PREV(price),
DOWN AS price &lt; PREV(price)
);
</programlisting>
LOWPRICE, UP, DOWN should be lower case, since they are not keywords?

scanRPRPatternRecursive
It would be better to replace all ``non-trivial quantifier`` to
``non-trivial quantifier (not {1,1}).``

I want to rename allocateRPRPattern to makeRPRPattern, what do you think?

RPRPatternNode.reluctant_location is not necessary. we never use it
for error reporting,
and since it's restricted to the end of a pattern, it adds very little value.

Is nav_volatile_func_checker really necessary?
We already have contain_volatile_functions and
contain_volatile_functions_not_nextval.
Also per https://www.postgresql.org/message-id/626986.1776785090@sss.pgh.pa.us
The convention is not to check expression volatility during parse
analysis stage.

+bool trailing_alt pg_node_attr(query_jumble_ignore);
I may check whether trailing_alt is necessary later, but
query_jumble_ignore is not needed because it's only manipulated in
gram.y and we do JumbleQuery in parse_analyze_fixedparams, which
obviously is later than all gram.y related work.
----------------------------------------------
v47-0001-v47-rpr-reformat-gram.y-again.nocfbot
The HINT message below is somewhat misleading, since the question mark
(?) is actually after the range quantifier.
Change errmsg to errmsg("invalid token \"%s\" after range quantifier",
$5) will make the error message clearer.

+SELECT FROM rpr_err WINDOW w AS ( ROWS BETWEEN CURRENT ROW AND 1
FOLLOWING PATTERN (A {1,2}??) DEFINE A AS TRUE);
+ERROR: invalid token "??" after range quantifier
+LINE 1: ...TWEEN CURRENT ROW AND 1 FOLLOWING PATTERN (A {1,2}??) DEFINE...
+ ^
+HINT: Only "?" is allowed after {n,m} to make it reluctant.
----------------------------------------------
v47-0002-v47-rpr-regress-tests-refactoring.nocfbot

-- Consecutive GROUP merge: (A B){2} (A B)+ -> (a b){3,}
-- Exercises the child->max == RPR_QUANTITY_INF branch in
mergeConsecutiveGroups,
--- where a finite prev ((A B){2,2}) meets an infinite child ((A B)+).

We should just drop these comments from the regress tests.
mergeConsecutiveGroups is a static function that might change, and we don't want
to have to update all the tests every time it does. Also the comments meaning
is still totally clear without them.
----------------------------------------------
v47-0003-refactor-many-functions-in-rpr.c.nocfbot
I did some major refactoring in rpr.c. Instead of allocating new lists to
remove duplicates, we now just mutate them in place using list_delete_nth_cell
(a common pattern in the codebase).
for overflow handling, it would be better use pg_mul_s32_overflow,
pg_add_s32_overflow.
See the commit message for the details.

v47-0004-another-v47-misc-refactoring.nocfbot
Some misc refactoring, that seems to make sense to me, see the commit
message for rationale.

The above v47-0001, v47-0002, and v47-0003, v47-0004 are based on
https://github.com/assam258-5892/postgres
latest commit.

--
jian
https://www.enterprisedb.com/

Attachment Content-Type Size
v47-0004-another-v47-misc-refactoring.nocfbot application/octet-stream 5.1 KB
v47-0002-v47-rpr-regress-tests-refactoring.nocfbot application/octet-stream 14.5 KB
v47-0001-v47-rpr-reformat-gram.y-again.nocfbot application/octet-stream 11.6 KB
v47-0003-v47-refactor-many-functions-in-rpr.c.nocfbot application/octet-stream 21.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Akshay Joshi 2026-06-09 08:58:15 Re: [PATCH] Add pg_get_table_ddl() to reconstruct CREATE TABLE statements
Previous Message Tatsuo Ishii 2026-06-09 08:13:07 Re: Row pattern recognition