Re: Remove redundant DISTINCT when GROUP BY already guarantees uniqueness

From: Andres Taylor <andres(at)taylor(dot)se>
To: Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Remove redundant DISTINCT when GROUP BY already guarantees uniqueness
Date: 2026-06-11 20:40:19
Message-ID: CAFJHCwdGQMwx-=0k9NTApidYObrLp8yz_kF7TJK4d6+=ctapMA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Ilia, Zsolt,

+1 to Zsolt's hasTargetSRFs point -- I built v1 and reproduced the wrong
results:

CREATE TABLE t (a int, b int, c int);
INSERT INTO t VALUES (1,1,10),(1,2,20),(2,1,30),(2,2,40),(1,1,50);
SELECT DISTINCT a, unnest(ARRAY[1,1]) AS u FROM t GROUP BY a ORDER BY a, u;

on master: (1,1),(2,1) # DISTINCT collapses the unnest dups
with v1: (1,1),(1,1),(2,1),(2,1) # DISTINCT removed, dups leak

The SRF in the tlist is expanded after grouping but before DISTINCT, so
the DISTINCT is not redundant. Window functions are safe (they don't
change cardinality), so hasTargetSRFs is the right and sufficient guard.

On Zsolt's suggetion to reuse query_is_distinct_for(): I think that's
the right direction, and it fixes more than the SRF case. The GROUP BY
branch there also checks equality_ops_are_compatible(opid, sgc->eqop)
and collations_agree_on_equality(...), neither of which the new
distinct_redundant_by_groupby() checks (it matches on tleSortGroupRef
alone). So reusing it closes the SRF, opclass, and collation gaps at
once.

One wrinkle if you go that way: query_is_distinct_for() can't be called
directly for this, because its first branch inspects query->distinctClause
and would return true trivially (the query's own DISTINCT "proves"
distinctness on its own columns). The reusable part is specifically the
GROUP-BY branch (plus the hasTargetSRFs precheck). Extracting that into a
helper that takes the distinct columns and is called from both sites
seems like the cleanest shape.

FWIW there's a complementary case this structure could host later:
DISTINCT made redundant by a unique index/constraint
(SELECT DISTINCT pk FROM t), proven via relation_has_unique_index_for().
That one needs a NULL gate that the GROUP BY case doesn't, so it's
separate logic -- but a shared "is this DISTINCT redundant?" entry point
with pluggable proofs would fit both. Happy to follow up with that as a
separate patch once this lands.

I tried signing up as a reviewer, but I need to wait for my one week
cool-off period first. I sign up ASAP.

Thanks for working on this!

On Thu, Jun 11, 2026 at 6:01 PM Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com> wrote:
>
> Hello!
>
> +static bool
> +distinct_redundant_by_groupby(Query *parse)
>
> I think this also should check hasTargetSRFs:
>
> CREATE TABLE t (a int, b int, c int);
> INSERT INTO t VALUES (1,1,10),(1,2,20),(2,1,30),(2,2,40),(1,1,50);
> SELECT DISTINCT a, unnest(ARRAY[1,1]) AS u FROM t GROUP BY a ORDER BY a, u;
>
> Also, query_is_distinct_for already does something similar (and
> already handles hasTargetSRFs) - would it be possible to extract the
> common part instead of duplicating the logic?

In response to

Browse pgsql-hackers by date

  From Date Subject
Previous Message Álvaro Herrera 2026-06-11 20:21:16 Re: Redundant/mis-use of _(x) gettext macro?