[PATCH] DISTINCT in plain aggregate window functions

From: Haibo Yan <tristan(dot)yim(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: [PATCH] DISTINCT in plain aggregate window functions
Date: 2026-04-08 05:31:39
Message-ID: CABXr29H2X+HtaPw-R3EheZUgv9fM7nSBjQCCaWCRv62mDYdM3w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Hackers

I’d like to start a patch series to add support for DISTINCT in plain
aggregate window functions.

PostgreSQL currently rejects cases such as:

---------------------------------------------------------------------------------------------------------

count(DISTINCT x) OVER (PARTITION BY p)

sum(DISTINCT x) OVER ()

---------------------------------------------------------------------------------------------------------

My plan is to implement this incrementally, by frame class and by feature
dimension, rather than trying to solve every case in a single patch.

For the first step, I’m posting patches 1-2 only and would appreciate your
review on those.

Patch 1 is intentionally very small:

- add parse/deparse plumbing for DISTINCT in plain aggregate window
functions
- carry the information through WindowFunc
- preserve it in ruleutils / deparse
- but still reject execution

Patch 1 by itself does not add user-visible execution support, so I think
it is best reviewed together with patch 2.

Patch 2 adds the first real executor support:

- plain aggregate window functions only
- single-argument DISTINCT only
- whole-partition frames only

That means support for cases where the frame is effectively the entire
partition, for example:

---------------------------------------------------------------------------------------------------------

count(DISTINCT x) OVER (PARTITION BY p)
sum(DISTINCT x) OVER ()
avg(DISTINCT x) OVER (
PARTITION BY p
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)

---------------------------------------------------------------------------------------------------------

The executor approach in patch 2 is deliberately conservative:

- collect the partition’s aggregate inputs
- sort and deduplicate them
- feed the distinct values into the aggregate transition function
- finalize once
- reuse the cached result for all rows in the partition

This avoids the much harder moving-frame cases for now.

My proposed overall roadmap is below:

Patch 1

- parse/deparse plumbing only
- allow DISTINCT to be represented on plain aggregate window functions
- preserve it through deparse / view definition
- still reject execution

Patch 2

- executor support for whole-partition frames
- plain aggregate window functions only
- single-argument DISTINCT only
- sort-and-dedup implementation

Patch 3

- executor support for non-shrinking frames
- frames starting at UNBOUNDED PRECEDING with no EXCLUDE
- incremental hash-based seen-set
- covers default ORDER BY frame and supported ... CURRENT ROW / ...
FOLLOWING cases

Patch 4

- executor support for sliding ROWS frames
- refcounted DISTINCT state
- add/remove distinct contributions as rows enter and leave the frame
- fallback to restart/recompute for aggregates without inverse
transition support

Patch 5

- extend the sliding DISTINCT machinery to sliding RANGE and GROUPS
- keep the same refcounted model
- no EXCLUDE yet

Patch 6

- support EXCLUDE clauses
- likely correctness-first, with restart/recompute where incremental
maintenance is too awkward

Patch 7

- support multi-argument DISTINCT
- upgrade DISTINCT keys from single datum to tuple/composite key
representation

Patch 8

- support aggregate ORDER BY inside window aggregates
- left until last because it is orthogonal to frame-shape support and
substantially complicates both parse representation and executor behavior

In short, the roadmap is:

1. plumbing
2. whole-partition
3. non-shrinking
4. sliding ROWS
5. sliding RANGE / GROUPS
6. EXCLUDE
7. multi-arg DISTINCT
8. aggregate ORDER BY

For this posting, I’d especially appreciate feedback on:

- whether patch 1 + patch 2 is a reasonable first split
- whether whole-partition-only executor support is a good first
executable step
- whether the proposed long-term breakdown seems sensible

Thanks in advance for any review or comments.

Best regards,

Haibo Yan

Attachment Content-Type Size
v1-0001-Support-parsing-and-deparsing-DISTINCT-in-window-.patch application/octet-stream 8.8 KB
v1-0002-Implement-executor-support-for-DISTINCT-in-whole-.patch application/octet-stream 27.1 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2026-04-08 05:32:15 Re: Do we still need MULE_INTERNAL?
Previous Message Amit Kapila 2026-04-08 05:30:58 Re: [Proposal] Adding Log File Capability to pg_createsubscriber