Remove redundant DISTINCT when GROUP BY already guarantees uniqueness

From: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
To: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Remove redundant DISTINCT when GROUP BY already guarantees uniqueness
Date: 2026-06-08 11:12:20
Message-ID: faf8f63f-b476-4ca8-89f9-7cfc104b522c@tantorlabs.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

When a query contains both DISTINCT and GROUP BY, planner currently
always creates a  separate plan node to eliminate duplicate output rows.
However, if GROUP BY is present, each output row already corresponds to
exactly one group - meaning the rows produced by GROUP BY are already
unique on their group keys. Any DISTINCT clause  that covers all those
group keys therefore adds no filtering and the de-duplication step is
pure overhead.

This patch teaches the planner to detect this situation and skip
creating the DISTINCT paths entirely. The optimization applies when
three conditions hold:
- the query uses a plan GROUP BY (not GROUPING SETS, which can produce
extra NULL-filled rows across sets),
- the clause is plain DISTINCT (not DISTINCT ON, which has different
semantics),
- and every GROUP BY key appears in the DISTINCT clause.

The last condition is necessary because if a GROUP BY key is absent from
the DISTINCT list, two different groups could still produce identical
output  tuple (e.g. `SELECT DISTINCT 1 FROM t GROUP BY a`)

Example:

```
CREATE TABLE t (a INT, b INT, c INT);
INSERT INTO t VALUES (1,1,10),(1,2,20),(2,1,30),(2,2,40),(1,1,50);
ANALYZE t;

EXPLAIN (COSTS OFF) SELECT DISTINCT a, b, sum(c) FROM t GROUP BY a, b;
```

Before patch:
```
            QUERY PLAN
----------------------------------
 Unique
   ->  Sort
         Sort Key: a, b, (sum(c))
         ->  HashAggregate
               Group Key: a, b
               ->  Seq Scan on t
(6 rows)
```

After patch
```
     QUERY PLAN
---------------------
 HashAggregate
   Group Key: a, b
   ->  Seq Scan on t
(3 rows)
```

Any feedback or suggestions for additional cases are welcome.

--
Best regards.
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/

Attachment Content-Type Size
v1-0001-Remove-redundant-DISTINCT-when-GROUP-BY-guarantee.patch text/x-patch 9.8 KB

Browse pgsql-hackers by date

  From Date Subject
Previous Message Tomas Vondra 2026-06-08 11:04:47 Re: Subquery pull-up increases jointree search space