From: | Jim Finnerty <jfinnert(at)amazon(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Removing useless DISTINCT clauses |
Date: | 2018-08-22 20:11:55 |
Message-ID: | 1534968715834-0.post@n3.nabble.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
distinct_optimization_v6.patch
<http://www.postgresql-archive.org/file/t348990/distinct_optimization_v6.patch>
Here is an update to this thread, for potential inclusion in v12. I
couldn't get the most recent 'v5' patch to apply cleanly, so I recreated a
v6 patch on PG10.5 by hand, and made a few changes and improvements:
- (a) If there is only one relation and the PK is present in the SELECT
list, then the distinctClause can be removed. Thus, a query of the form
SELECT DISTINCT pk, * FROM t; recognizes that the body of the SELECT is
already distinct on pk, and doesn't do an unnecessary DISTINCT operation
- (b) If there is only one relation, the PK is present, and there are no
aggregates, then the groupByClause can similarly be removed
- (c) If DISTINCT ON is specified, but no ORDER BY is specified, then it
acts like a regular DISTINCT
- (d) If the distinct clause is modified, this fact is recorded in a new
bool in the Query struct so that when the distinct was modified,
create_distinct_paths no longer Asserts that the needed pathkeys are
contained in the distinct_pathkeys.
- (e) Since the new bool is in the Query struct,
copyfuncs/equalfuncs/outfuncs/readfuncs support is also provided. This will
cause a problem unless the pg_rewrite catalog is regenerated.
The pg_rewrite catalog contains a serialized representation of the Query
node in its ev_action column. If there is a way to recreate the contents of
the pg_rewrite relation without bumping the catversion, can someone please
explain how? If not, then this change is incomplete and would require a new
catalog version (catversion.h) too.
Additional work on this patch would be desirable. It should check for
unique + not null, in addition to just the pk constraint. The DISTINCT
could be eliminated in cases with multiple relations if all the joins are
1:1, although that would arguably belong in a different patch.
/Jim
p.s. the v6 patch works for the problem case that Tom Lane reported with the
v5 patch
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
From | Date | Subject | |
---|---|---|---|
Next Message | Nikita Glukhov | 2018-08-22 21:29:17 | Re: jsonpath |
Previous Message | Daniel Verite | 2018-08-22 20:05:41 | Re: csv format for psql |