| From: | Haibo Yan <tristan(dot)yim(at)gmail(dot)com> |
|---|---|
| To: | assam258(at)gmail(dot)com |
| Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: [PATCH] DISTINCT in plain aggregate window functions |
| Date: | 2026-06-28 17:58:23 |
| Message-ID: | CABXr29EwORuhHY9KhzWW1HsRG98j2rJ6AbOO_TKnkz91imvGiw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Henson,
Thank you for the careful review. This is very helpful.
I updated the first three patches to expand the regression coverage, and
also updated patch 3 to make the current memory limitation of the
grow-only hash path explicit in both the commit message and code
comments.
On Thu, Jun 25, 2026 at 6:03 PM Henson Choi <assam258(at)gmail(dot)com> wrote:
>
> Hi Haibo,
>
> This is an initial, high-level pass over v2 (patches 1-3), not a detailed
> code review -- I'll follow up with more as time permits. For this round I
> wanted to confirm the shape of the feature and the cases it rejects, flag
> where its behavior diverges from non-window DISTINCT, note test coverage,
> and check one interaction. A short list of suggestions is at the end.
>
> Overall the approach looks reasonable, and for the shapes I tried the
> per-row values matched equivalent non-window formulations (correlated /
> GROUP BY rewrites, and Oracle for the whole-partition forms). The points
> below are mostly about matching our own non-window DISTINCT semantics, not
> about the patch being incorrect.
>
> 1. Added surface
> ----------------
> The series makes agg(DISTINCT x) OVER (...) accepted for plain aggregate
> window functions (count/sum/avg/...), which PostgreSQL currently rejects.
> DISTINCT is carried through parse and deparse so it survives view
> definitions, and execution is enabled for a defined subset of frames.
>
> 2. Behavior by frame shape
> --------------------------
> Execution uses one of two strategies depending on the frame:
>
> - Whole-partition frames (the frame is effectively the entire
> partition): the distinct result is the same for every row, so it is
> computed once and reused.
>
> - Running / non-shrinking (grow-only) frames (start at UNBOUNDED
> PRECEDING, no EXCLUDE): the distinct set only grows as rows enter, so it
> is maintained incrementally per row.
>
> 3. Cases that are (intentionally) rejected
> ------------------------------------------
> - DISTINCT on an aggregate that takes more than one argument (roadmap 7)
> - frames that don't start at UNBOUNDED PRECEDING, or that use EXCLUDE
> (roadmap 4-6)
> - in the running case, an argument type that is not hashable
>
> 4. Behavior-level observations
> ------------------------------
> Compared with GROUP BY agg(DISTINCT x), three things differ between the two
> frame strategies:
>
> - Type support: a type that is sortable but not hashable is accepted in
> the whole-partition case yet rejected in the running case, whereas
> GROUP BY DISTINCT accepts it (e.g. count(DISTINCT x::money)).
>
That makes sense as a longer-term direction.
For this series, though, I would prefer to keep the current split. My
goal in patches 2 and 3 is to add the basic executor support in small,
reviewable steps: first the whole-partition sort case, then the simpler
grow-only incremental case, before moving on to true sliding frames.
For the current patch 3 implementation, the grow-only path assumes
hashable input types and errors out otherwise. That is an executor-level
decision for this initial step. A planner-driven choice between sort and
hash paths remains an independent direction that can be explored later.
> - Value order: for order-sensitive aggregates (e.g. array_agg(DISTINCT
> ...)), the order in which distinct values are aggregated differs
> between the two frame strategies, so the same expression can produce
> differently-ordered results.
>
I agree this is worth calling out more explicitly.
For order-insensitive aggregates such as count/sum/min/max, this is not
observable. For order-sensitive aggregates such as array_agg or
string_agg, the result can differ between strategies if no aggregate-
local ORDER BY is present. I do not consider this a correctness issue,
because the SQL standard does not prescribe an ordering for that form.
However, it is a visible plan-dependent behavior difference and should
be documented as such.
I also agree that regression coverage should not rely on unspecified
ordering. In the updated tests I made the order-sensitive grow-only case
deterministic by driving the frame with a unique ORDER BY key, so the
expected result does not depend on flaky input ordering.
> - Memory: the running (hash) path is the only one of these that holds
> its distinct set in an unbounded in-memory structure; everything else,
> including the patch's own whole-partition path, is work_mem-bounded and
> spills to disk:
>
> whole-partition (this patch) sort work_mem-bounded, spills
> running (this patch) hash unbounded, no spill
> GROUP BY agg(DISTINCT x) sort work_mem-bounded, spills
> SELECT DISTINCT, HashAgg hash work_mem-bounded, spills
> SELECT DISTINCT, Sort+Unique sort work_mem-bounded, spills
>
> So a large, high-cardinality partition in the running case can grow the
> hash table without bound -- whereas every other path here, including the
> hash-based HashAgg, is work_mem-bounded and spills.
>
I agree this is a real limitation of the current patch 3 approach, not
the intended end state.
I updated patch 3 to make this more explicit both in the commit message
and in the code comments. In particular, I added a comment in
initialize_windowaggregate() just above the hash-table setup. The
current grow-only hash path is not bounded by work_mem and retains every
distinct value seen in the partition. I agree that bounded-memory
behavior for this path would need further design, rather than being
treated as an acceptable final trade-off.
> 5. Test coverage (measured)
> ---------------------------
> I ran line coverage on the changed executor code. The patch's added and
> changed lines are about 77% covered (219/284); the file as a whole is
> around 91%. The uncovered ~23% of the new lines is not spread out -- it
> concentrates in a few behaviors the functional tests never exercise,
> because those tests center on count/sum/avg over integer arguments:
>
> - strict aggregates (max/min DISTINCT), i.e. the strict/NULL-seed path
> - pass-by-reference argument and transition types (text/numeric)
> - (plus a couple of effectively unreachable guards)
>
> A few more gaps don't show up as uncovered lines: RANGE frames with peer
> rows (tied ORDER BY keys), GROUPS mode, and order-sensitive aggregates
> (per the value-order point above).
>
> 6. Interaction with row pattern recognition (RPR)
> -------------------------------------------------
> I applied this series on top of the RPR patch set and built the two
> together. They combine cleanly -- no textual (rebase) conflict, and the
> regression tests pass without any change to code or tests.
>
> That they coexist is structural, not luck: RPR requires the frame to start
> at CURRENT ROW, while this feature requires UNBOUNDED PRECEDING, so a single
> window can satisfy at most one of them -- a combined query is always
> rejected by one side or the other. As long as these two frame requirements
> stay mutually exclusive, I would expect the two to merge without trouble.
>
> One asymmetry worth noting: the RPR side's CURRENT-ROW requirement follows
> from the standard's definition of row pattern recognition, whereas this
> feature's UNBOUNDED-PRECEDING requirement is one the roadmap plans to relax
> (the sliding-frame patches). So the open question is whether that
> relaxation will ever reach frames that start at CURRENT ROW -- the exact
> shape RPR requires, and the point at which the two features would meet. Is
> CURRENT-ROW-start support part of the plan?
Thanks for checking this interaction. I am glad to hear the current
series composes cleanly with the RPR patches.
I have not yet studied whether the roadmap would eventually reach
CURRENT ROW-start frames. If it does, the RPR interaction will need
careful re-evaluation.
>
> 7. Suggestions
> --------------
> - Type support: the planner should decide sort vs hash from the argument
> type, and the executor should just carry that out -- a non-hashable type
> would then route to sort instead of being rejected at startup.
On the type-support and memory points above, I agree those are the main
longer-term issues for this approach, and I have tried to clarify the
present behavior and limitations in the updated patches.
> - Memory: the running hash seen-set grows without bound. It should spill
> at work_mem the way HashAgg already does for grouping; hash can stay the
> default for the common hashable case.
> - Tests: fill the coverage gaps in section 5 -- strict aggregates,
> pass-by-reference types, RANGE peers, GROUPS mode. I can provide cases
> with cross-checked expected values.
Agreed, and I have already expanded the first three patches in that
direction in the updated version.
Patch 1 itself is unchanged here, since its existing parse/deparse and
executor-side rejection coverage was already adequate. In patches 2 and
3, I added pass-by-reference text cases, a strict-aggregate case for
the whole-partition path, and a more representative grow-only case
using array_agg(DISTINCT …) over a deterministic running frame.
The remaining frame-shape coverage, such as RANGE peers and GROUPS mode,
will come naturally with the later patches that introduce those frame
types, rather than being bolted onto these first three patches.
> - Documentation: there are no SGML changes for this new form. The
> window-function-call synopsis doesn't show the DISTINCT form, and no
> prose mentions it.
Agreed. I have not added the SGML changes yet because I wanted to get
review on the execution model first, and this initial sub-series is
still focused on that part. But this definitely needs to be covered
before the full series is complete.
> - Roadmap ordering (just a thought): a grow-only frame is the add-only
> special case of the sliding machinery in patches 4-6, and the hash path
> is where the section-4 divergences live -- so patch 3 reads more like an
> optimization than a foundation.
I understand the point. From an implementation perspective, though, I
still think it is useful to keep the grow-only case separate.
Although it can be viewed as an add-only special case of the later
sliding machinery, it also lets the incremental DISTINCT state be
reviewed on its own before introducing row exit, refcounted removal,
and the other moving-frame complications in the same patch. So I see it
as both a reviewable intermediate step and a natural strategy for
grow-only frames.
>
> Regards,
> Henson
Thanks again for the review. Please see the updated patches.
Regards,
Haibo
| Attachment | Content-Type | Size |
|---|---|---|
| v3-0001-Parse-and-deparse-DISTINCT-in-window-aggregate-ca.patch | application/octet-stream | 6.6 KB |
| v3-0002-Support-DISTINCT-in-whole-partition-window-aggreg.patch | application/octet-stream | 25.4 KB |
| v3-0003-Extend-DISTINCT-window-aggregates-to-grow-only-fr.patch | application/octet-stream | 28.4 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Erik Wienhold | 2026-06-28 18:14:11 | Re: CREATE OR REPLACE MATERIALIZED VIEW |
| Previous Message | Feng Wu | 2026-06-28 16:53:04 | Re: [PATCH] Avoid collation lookup for "char" statistics |