| From: | Haibo Yan <tristan(dot)yim(at)gmail(dot)com> |
|---|---|
| To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: [PATCH] DISTINCT in plain aggregate window functions |
| Date: | 2026-04-30 01:57:07 |
| Message-ID: | CABXr29EW_+XRD4dTcEN4+=K78OvetMWMqF0qgSv_JwG6D4Gu3g@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, Apr 7, 2026 at 10:31 PM Haibo Yan <tristan(dot)yim(at)gmail(dot)com> wrote:
> Hi Hackers
>
> I’d like to start a patch series to add support for DISTINCT in plain
> aggregate window functions.
>
> PostgreSQL currently rejects cases such as:
>
>
> ---------------------------------------------------------------------------------------------------------
>
> count(DISTINCT x) OVER (PARTITION BY p)
>
> sum(DISTINCT x) OVER ()
>
>
> ---------------------------------------------------------------------------------------------------------
>
> My plan is to implement this incrementally, by frame class and by feature
> dimension, rather than trying to solve every case in a single patch.
>
> For the first step, I’m posting patches 1-2 only and would appreciate your
> review on those.
>
> Patch 1 is intentionally very small:
>
>
> - add parse/deparse plumbing for DISTINCT in plain aggregate window
> functions
> - carry the information through WindowFunc
> - preserve it in ruleutils / deparse
> - but still reject execution
>
> Patch 1 by itself does not add user-visible execution support, so I think
> it is best reviewed together with patch 2.
>
> Patch 2 adds the first real executor support:
>
>
> - plain aggregate window functions only
> - single-argument DISTINCT only
> - whole-partition frames only
>
> That means support for cases where the frame is effectively the entire
> partition, for example:
>
>
> ---------------------------------------------------------------------------------------------------------
>
> count(DISTINCT x) OVER (PARTITION BY p)
> sum(DISTINCT x) OVER ()
> avg(DISTINCT x) OVER (
> PARTITION BY p
> ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
> )
>
>
> ---------------------------------------------------------------------------------------------------------
>
> The executor approach in patch 2 is deliberately conservative:
>
>
> - collect the partition’s aggregate inputs
> - sort and deduplicate them
> - feed the distinct values into the aggregate transition function
> - finalize once
> - reuse the cached result for all rows in the partition
>
> This avoids the much harder moving-frame cases for now.
>
> My proposed overall roadmap is below:
>
> Patch 1
>
>
> - parse/deparse plumbing only
> - allow DISTINCT to be represented on plain aggregate window functions
> - preserve it through deparse / view definition
> - still reject execution
>
> Patch 2
>
>
> - executor support for whole-partition frames
> - plain aggregate window functions only
> - single-argument DISTINCT only
> - sort-and-dedup implementation
>
> Patch 3
>
>
> - executor support for non-shrinking frames
> - frames starting at UNBOUNDED PRECEDING with no EXCLUDE
> - incremental hash-based seen-set
> - covers default ORDER BY frame and supported ... CURRENT ROW / ...
> FOLLOWING cases
>
> Patch 4
>
>
> - executor support for sliding ROWS frames
> - refcounted DISTINCT state
> - add/remove distinct contributions as rows enter and leave the frame
> - fallback to restart/recompute for aggregates without inverse
> transition support
>
> Patch 5
>
>
> - extend the sliding DISTINCT machinery to sliding RANGE and GROUPS
> - keep the same refcounted model
> - no EXCLUDE yet
>
> Patch 6
>
>
> - support EXCLUDE clauses
> - likely correctness-first, with restart/recompute where incremental
> maintenance is too awkward
>
> Patch 7
>
>
> - support multi-argument DISTINCT
> - upgrade DISTINCT keys from single datum to tuple/composite key
> representation
>
> Patch 8
>
>
> - support aggregate ORDER BY inside window aggregates
> - left until last because it is orthogonal to frame-shape support and
> substantially complicates both parse representation and executor behavior
>
> In short, the roadmap is:
>
>
> 1. plumbing
> 2. whole-partition
> 3. non-shrinking
> 4. sliding ROWS
> 5. sliding RANGE / GROUPS
> 6. EXCLUDE
> 7. multi-arg DISTINCT
> 8. aggregate ORDER BY
>
> For this posting, I’d especially appreciate feedback on:
>
>
> - whether patch 1 + patch 2 is a reasonable first split
> - whether whole-partition-only executor support is a good first
> executable step
> - whether the proposed long-term breakdown seems sensible
>
> Thanks in advance for any review or comments.
>
> Best regards,
>
> Haibo Yan
>
>
> I’ve managed to finish the first sub-series adding initial support for
DISTINCT in plain aggregate window functions.
Patch 1 teaches the parser and deparser to accept DISTINCT in plain
window aggregates. This is representation-only and does not change
execution yet.
Patch 2 adds executor support for the simplest case, whole-partition
frames, using a sort-and-deduplicate path.
Patch 3 extends that support to non-shrinking frames, where rows only
enter the frame, by using an incremental hash-based seen-set instead of
restarting the aggregate for each row.
Please review.
Regards,
Haibo
| Attachment | Content-Type | Size |
|---|---|---|
| v2-0003-Extend-DISTINCT-window-aggregates-to-grow-only-fr.patch | application/octet-stream | 26.7 KB |
| v2-0002-Support-DISTINCT-in-whole-partition-window-aggreg.patch | application/octet-stream | 24.1 KB |
| v2-0001-Parse-and-deparse-DISTINCT-in-window-aggregate-ca.patch | application/octet-stream | 6.6 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Paquier | 2026-04-30 02:33:35 | Re: Fix the error hint message and test for reset_shared with unknown stats type |
| Previous Message | Thomas Munro | 2026-04-30 00:40:52 | Re: Experimenting with wider Unicode storage |