Re: Row pattern recognition

From: Henson Choi <assam258(at)gmail(dot)com>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: ssam258(at)gmail(dot)com, vik(at)postgresfriends(dot)org, er(at)xs4all(dot)nl, jacob(dot)champion(at)enterprisedb(dot)com, david(dot)g(dot)johnston(at)gmail(dot)com, peter(at)eisentraut(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Row pattern recognition
Date: 2026-02-13 01:14:26
Message-ID: CAAAe_zAzOgB2KR9ACDD2o3QNP_gaCKnUd2bRGgbhcD=og50XXA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tatsuo,

However, this raises interesting questions: should we optimize patterns
> by removing {0} quantifiers or simplifying them? And if so, how should
> we handle patterns that become empty after such optimization?
>
> For example:
> - PATTERN (A{0}) → empty pattern
> - PATTERN (A{0} B{0}) → empty pattern
> - PATTERN (A{0} B) → PATTERN (B) after optimization
>
> Empty patterns would result in zero-length matches, which our current
> implementation explicitly treats as invalid (see initialAdvance flag
> logic in nodeWindowAgg.c).
>
> More importantly, I recall that zero-length matches caused serious
> issues during development, which is why we added logic to explicitly
> avoid them.
>
> The reason I cannot immediately provide a concrete plan for A{0}
> support is that I need to deeply understand the semantic meaning of
> zero-length matches in the SQL standard first. Without this
> understanding, any implementation approach could be fundamentally
> flawed.
>
> Specifically, I need to investigate:
> - What zero-length matches mean semantically in RPR
> - How to handle empty patterns according to the standard
> - The correct behavior when a pattern optimizes to nothing
>
> After the current code review phase is complete, I'm also considering
> setting up an Oracle test environment to observe how it handles these
> edge cases. This could provide valuable insights into the expected
> behavior, especially for zero-length matches and empty patterns.
>

Our current implementation cannot support A{0} due to a structural
limitation.

The reduced_frame_map uses row-based representation (reduced_frame_map[pos]
= val),
which can only express matches consuming at least one row. It cannot
represent
zero-length matches that occur between rows without consuming any row
position.

Patterns like A{0}, A*, or A? can produce zero-length matches with no row
to mark
as RF_FRAME_HEAD and no position to register in the frame map.

We currently prevent this using the initialAdvance flag (nodeWindowAgg.c),
which skips FIN recording during initial epsilon transitions.

Supporting A{0} requires either restructuring reduced_frame_map to handle
virtual positions, or separate handling for zero-length matches. Before
choosing an approach, we need clarity on what the SQL standard expects for
zero-length match semantics (output generation, aggregate behavior, etc.).

Given this structural limitation, I'd like to ask: should we keep the
current
initialAdvance mechanism (which prevents zero-length matches) and handle
A{0}
separately?

Best regards,
Henson

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2026-02-13 01:29:08 Re: COMMENTS are not being copied in CREATE TABLE LIKE
Previous Message David Steele 2026-02-13 00:55:15 Re: recovery.signal not cleaned up when both signal files are present