| From: | Henson Choi <assam258(at)gmail(dot)com> |
|---|---|
| To: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
| Cc: | ssam258(at)gmail(dot)com, vik(at)postgresfriends(dot)org, er(at)xs4all(dot)nl, jacob(dot)champion(at)enterprisedb(dot)com, david(dot)g(dot)johnston(at)gmail(dot)com, peter(at)eisentraut(dot)org, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Row pattern recognition |
| Date: | 2026-02-13 01:14:26 |
| Message-ID: | CAAAe_zAzOgB2KR9ACDD2o3QNP_gaCKnUd2bRGgbhcD=og50XXA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Tatsuo,
However, this raises interesting questions: should we optimize patterns
> by removing {0} quantifiers or simplifying them? And if so, how should
> we handle patterns that become empty after such optimization?
>
> For example:
> - PATTERN (A{0}) → empty pattern
> - PATTERN (A{0} B{0}) → empty pattern
> - PATTERN (A{0} B) → PATTERN (B) after optimization
>
> Empty patterns would result in zero-length matches, which our current
> implementation explicitly treats as invalid (see initialAdvance flag
> logic in nodeWindowAgg.c).
>
> More importantly, I recall that zero-length matches caused serious
> issues during development, which is why we added logic to explicitly
> avoid them.
>
> The reason I cannot immediately provide a concrete plan for A{0}
> support is that I need to deeply understand the semantic meaning of
> zero-length matches in the SQL standard first. Without this
> understanding, any implementation approach could be fundamentally
> flawed.
>
> Specifically, I need to investigate:
> - What zero-length matches mean semantically in RPR
> - How to handle empty patterns according to the standard
> - The correct behavior when a pattern optimizes to nothing
>
> After the current code review phase is complete, I'm also considering
> setting up an Oracle test environment to observe how it handles these
> edge cases. This could provide valuable insights into the expected
> behavior, especially for zero-length matches and empty patterns.
>
Our current implementation cannot support A{0} due to a structural
limitation.
The reduced_frame_map uses row-based representation (reduced_frame_map[pos]
= val),
which can only express matches consuming at least one row. It cannot
represent
zero-length matches that occur between rows without consuming any row
position.
Patterns like A{0}, A*, or A? can produce zero-length matches with no row
to mark
as RF_FRAME_HEAD and no position to register in the frame map.
We currently prevent this using the initialAdvance flag (nodeWindowAgg.c),
which skips FIN recording during initial epsilon transitions.
Supporting A{0} requires either restructuring reduced_frame_map to handle
virtual positions, or separate handling for zero-length matches. Before
choosing an approach, we need clarity on what the SQL standard expects for
zero-length match semantics (output generation, aggregate behavior, etc.).
Given this structural limitation, I'd like to ask: should we keep the
current
initialAdvance mechanism (which prevents zero-length matches) and handle
A{0}
separately?
Best regards,
Henson
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Chao Li | 2026-02-13 01:29:08 | Re: COMMENTS are not being copied in CREATE TABLE LIKE |
| Previous Message | David Steele | 2026-02-13 00:55:15 | Re: recovery.signal not cleaned up when both signal files are present |