| From: | Antonin Houska <ah(at)cybertec(dot)at> |
|---|---|
| To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Avoid unnecessary processing of DISTINCT clause (Was: Unique Keys) |
| Date: | 2025-11-19 18:42:26 |
| Message-ID: | 206420.1763577746@localhost |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
[1] uses bottom-up approach to check if scan/join path already happens to
produce distinct rows, in order to avoid the DISTINCT step of planning. It
does so by conveying the uniqueness information ("unique keys") from
individual table/index scans to the upper planner. The concept is similar to
"path keys", which describe ordering of the path output. However, unlike "path
keys", the "unique keys" do not participate (AFAICS) in planning decisions in
the scan/join planning.
Here I try to introduce a top-down approach and deduce the path distinctness
from the existing information in the path tree. As the patch does not add any
information that the planner would have to propagate from lower to upper
nodes, it's less invasive than [1]. (Unlike [1], I haven't implemented the
"single row optimization" yet, but I see no reason why it shouldn't be
possible.)
For index / scan path, the patch obviously uses unique indexes as the "source
of uniqueness". For joins, it relies on the JoinPath.inner_unique field. The
theory is that a join produces unique rows if the 1) outer path does and 2) no
more than one inner row matches each outer row (i.e. the inner path does not
"duplicate" the outer rows). For more details, please see the commit message
and the patch itself.
Finally it's checked whether the final scan/join output would be unique even
if it contained only the DISTINCT expressions. If it does, no additional
processing (such as UniquePath or AggPath) is needed.
Is anything wrong about this approach, whether conceptually or in details?
[1] https://www.postgresql.org/message-id/7mlamswjp81p.fsf%40e18c07352.et15sqa
--
Antonin Houska
Web: https://www.cybertec-postgresql.com
| Attachment | Content-Type | Size |
|---|---|---|
| 0001-Avoid-DISTINCT-step-if-the-input-path-already-genera.patch | text/x-diff | 29.2 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Peter Eisentraut | 2025-11-19 18:44:20 | Re: Type assertions without GCC builtins |
| Previous Message | Greg Burd | 2025-11-19 18:21:51 | Re: Expanding HOT updates for expression and partial indexes |