| From: | Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> |
|---|---|
| To: | PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Remove redundant DISTINCT when GROUP BY already guarantees uniqueness |
| Date: | 2026-06-08 11:12:20 |
| Message-ID: | faf8f63f-b476-4ca8-89f9-7cfc104b522c@tantorlabs.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi hackers,
When a query contains both DISTINCT and GROUP BY, planner currently
always creates a separate plan node to eliminate duplicate output rows.
However, if GROUP BY is present, each output row already corresponds to
exactly one group - meaning the rows produced by GROUP BY are already
unique on their group keys. Any DISTINCT clause that covers all those
group keys therefore adds no filtering and the de-duplication step is
pure overhead.
This patch teaches the planner to detect this situation and skip
creating the DISTINCT paths entirely. The optimization applies when
three conditions hold:
- the query uses a plan GROUP BY (not GROUPING SETS, which can produce
extra NULL-filled rows across sets),
- the clause is plain DISTINCT (not DISTINCT ON, which has different
semantics),
- and every GROUP BY key appears in the DISTINCT clause.
The last condition is necessary because if a GROUP BY key is absent from
the DISTINCT list, two different groups could still produce identical
output tuple (e.g. `SELECT DISTINCT 1 FROM t GROUP BY a`)
Example:
```
CREATE TABLE t (a INT, b INT, c INT);
INSERT INTO t VALUES (1,1,10),(1,2,20),(2,1,30),(2,2,40),(1,1,50);
ANALYZE t;
EXPLAIN (COSTS OFF) SELECT DISTINCT a, b, sum(c) FROM t GROUP BY a, b;
```
Before patch:
```
QUERY PLAN
----------------------------------
Unique
-> Sort
Sort Key: a, b, (sum(c))
-> HashAggregate
Group Key: a, b
-> Seq Scan on t
(6 rows)
```
After patch
```
QUERY PLAN
---------------------
HashAggregate
Group Key: a, b
-> Seq Scan on t
(3 rows)
```
Any feedback or suggestions for additional cases are welcome.
--
Best regards.
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-Remove-redundant-DISTINCT-when-GROUP-BY-guarantee.patch | text/x-patch | 9.8 KB |
| From | Date | Subject | |
|---|---|---|---|
| Previous Message | Tomas Vondra | 2026-06-08 11:04:47 | Re: Subquery pull-up increases jointree search space |