Discard ORDER BY/DISTINCT when an ANY/IN sublink is pulled up to a join

From: Ewan Young <kdbase(dot)hack(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Discard ORDER BY/DISTINCT when an ANY/IN sublink is pulled up to a join
Date: 2026-06-10 12:16:30
Message-ID: CAON2xHNBF7cP5gkzpi6xnzxFCePcUCvtAJy4xJZudjWwW0mYow@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

When convert_ANY_sublink_to_join() turns an IN/ANY sublink -- or a
provably null-safe NOT IN, which becomes an anti join -- into a semi or
anti join, it currently leaves the sub-select's ORDER BY and DISTINCT
clauses in place. Neither can affect the result of a semi/anti join,
which depends only on whether a matching inner row exists, not on the
order of the inner rows or whether they are distinct.

simplify_EXISTS_query() already drops these clauses for EXISTS; the
ANY/IN path is just the one place that doesn't. The attached patch makes
it consistent: it discards ORDER BY and DISTINCT once we've committed to
the conversion.

if (subselect->limitOffset == NULL &&
subselect->limitCount == NULL &&
!subselect->hasDistinctOn)
{
subselect->distinctClause = NIL;
subselect->sortClause = NIL;
}

Besides saving a useless sort or de-duplication of the inner side, this
lets the sub-select become "simple" so that pull_up_subqueries() can
flatten it (see is_simple_subquery()) instead of leaving a separate
subquery. The effect is most visible for a correlated sub-select, e.g.

explain (costs off)
select * from tenk1 t
where t.hundred in (select b.ten from tenk2 b
where b.unique2 = t.unique1 order b

which today plans as a Nested Loop Semi Join over a Subquer
inner Sort is re-executed per outer row, and with the patch becomes a
plain Hash Semi Join.

The clauses are kept when the sub-select has LIMIT/OFFSET (
and DISTINCT do determine which rows are returned) or uses DISTINCT ON
(which selects particular rows in conjunction with ORDER BY

A semi/anti join's result is a function of the *set* of inn
(existence of a match), not their order or multiplicity. Removing
DISTINCT doesn't change that set or the presence of NULLs;
ORDER BY (absent LIMIT/OFFSET) doesn't change which rows are produced.
This holds regardless of how the inner rows are computed, s
safe with aggregates, window functions and set-returning functions in the
sub-select -- only the query-level sortClause is touched, n
own ORDER BY. The NOT IN -> anti join direction is reached only after the
existing nullability proofs, so it is unaffected. groupCla
alone on purpose, so no RTE_GROUP / hasGroupRTE cleanup is needed (unlike
simplify_EXISTS_query, which discards the whole targetlist)

Removing the clauses turns a forced de-duplication into a c
choice. For IN/semi joins the planner can still unique-ify the inner
(JOIN_UNIQUE_INNER) when that is cheaper, so the previous p
reachable; where it isn't (e.g. a semi join that short-circuits early)
the plan improves. For NOT IN/anti joins the planner adapt
shape (e.g. a Hash Right Anti Join) so the inner is not materialized into
a large hash, and dropping the forced dedup pass is typical
neutral-to-faster. As with any change that widens the plan space a
different plan could occasionally be cheaper -- anti joins
have no JOIN_UNIQUE-style inner-dedup fallback -- but I have not found a
case where the difference is more than marginal.

make check passes with no changes to existing expected outp
adds coverage to subselect.sql: that the DISTINCT and ORDER BY spellings
now plan like the plain ones, that LIMIT and DISTINCT ON ar
that aggregates and window functions are handled safely, and that both
the semi (IN) and anti (NOT IN) paths are exercised.

I searched the archives and didn't find a prior proposal fo
specific simplification; the closest related work is the general
subquery-unnesting discussion[1], which is orthogonal to th
change.

Thoughts welcome.

Thanks,
Ewan Young

[1] https://www.postgresql.org/message-id/67e353e8-c20e-7980-a282-538779edf25b@iki.fi

Attachment Content-Type Size
v1-0001-Discard-ORDER-BY-and-DISTINCT-in-an-ANY-IN-sub-se.patch application/octet-stream 16.0 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2026-06-10 12:17:07 Re: PG19 FK fast path: OOB write and missed FK checks during batched
Previous Message Japin Li 2026-06-10 11:58:14 Re: Fix md5_password_warnings for role/database settings