Re: Removing useless DISTINCT clauses

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: David Rowley <dgrowley(at)gmail(dot)com>
Subject: Re: Removing useless DISTINCT clauses
Date: 2018-03-21 03:29:15
Message-ID: 152160295559.12147.1676278544502314555.pgcf@coridan.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: tested, passed
Documentation: tested, passed

This is a review of the patch to remove useless DISTINCT columns from the DISTINCT clause
https://www.postgresql.org/message-id/CAKJS1f8UMJ137sRuVSnEMDDpa57Q71JuLZi4yLCFMekNYVYqaQ%40mail.gmail.com

Contents & Purpose
==================

This patch removes any additional columns in the DISTINCT clause when the
table's primary key columns are entirely present in the DISTINCT clause. This
optimization works because the PK columns functionally determine the other
columns in the relation. The patch includes regression test cases.

Initial Run
===========
The patch applies cleanly to HEAD. All tests which are run as part of the
`installcheck` target pass correctly (all existing tests pass, all new tests
pass with the patch applied and fail without it). All TAP tests pass. All tests
which are run as part of the `installcheck-world` target pass except one of the
Bloom contrib module tests in `contrib/bloom/sql/bloom.sql`,
`CREATE INDEX bloomidx ON tst USING bloom (i, t) WITH (col1 = 3);`
which is currently also failing (crashing) for me on master, so it is unrelated to the
patch. The test cases seem to cover the new behavior.

Functionality
=============
Given that this optimization is safe for the GROUP BY clause (you can remove
functionally dependent attributes from the clause there as well), the logic
does seem to follow for DISTINCT. It seems semantically correct to do this. As
for the implementation, the author seems to have reasonably addressed concerns
expressed over the distinction between DISTINCT and DISTINCT ON. As far as I
can see, this should be fine.

Code Review
===========
For a small performance hit but an improvement in readability, the length check
can be moved from the individual group by and distinct clause checks into the
helper function

if (list_length(parse->distinctClause) < 2)
return;

and

if (list_length(parse->groupClause) < 2)
return;

can be moved into `remove_functionally_dependent_clauses`

Comments/Documentation
=======================

The main helper function that is added `remove_functionally_dependent_clauses`
uses one style of comment--with the name of the function and then the rest of
the description indented which is found some places in the code,
/*
* remove_functionally_dependent_clauses
* Processes clauselist and removes any items which are deemed to be
* functionally dependent on other clauselist items.
*
* If any item from the list can be removed, then a new list is built which
* does not contain the removed items. If no item can be removed then the
* original list is returned.
*/

while other helper functions in the same file use a different style--all lines
flush to the side with a space. I'm not sure which is preferred

/*
* limit_needed - do we actually need a Limit plan node?
*
* If we have constant-zero OFFSET and constant-null LIMIT, we can skip adding
* a Limit node. This is worth checking for because "OFFSET 0" is a common
* locution for an optimization fence. (Because other places in the planner
...

it looks like the non-indented ones are from older commits (2013 vs 2016).

The list length change is totally optional, but I'll leave it as Waiting On Author in case the author wants to make that change.

Overall, LGTM.

The new status of this patch is: Waiting on Author

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-03-21 03:43:16 Re: [HACKERS] taking stdbool.h into use
Previous Message Michael Paquier 2018-03-21 03:24:41 Re: [HACKERS] taking stdbool.h into use