Re: [HACKERS] Removing useless DISTINCT clauses

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Removing useless DISTINCT clauses
Date: 2018-01-09 20:26:45
Message-ID: 9554.1515529605@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
> [ remove_useless_distinct_clauses_v2.patch ]

This is a cute idea, but I'm troubled by a couple of points:

1. Once you don't have all the tlist items shown in DISTINCT, it really is
more like DISTINCT ON, seems like. I am not sure it's a good idea to set
hasDistinctOn, because that engages some planner behaviors we probably
don't want, but I'm also not sure we can get away with just ignoring the
difference. As an example, in allpaths.c there are assorted assumptions
that having a distinctClause but !hasDistinctOn means all output columns
of a subquery are listed in the distinctClause.

2. There's a comment in planner.c to the effect that

* When we have DISTINCT ON, we must sort by the more rigorous of
* DISTINCT and ORDER BY, else it won't have the desired behavior.
* Also, if we do have to do an explicit sort, we might as well use
* the more rigorous ordering to avoid a second sort later. (Note
* that the parser will have ensured that one clause is a prefix of
* the other.)

Removing random elements of the distinctClause will break its
correspondence with the sortClause, with probably bad results.

I do not remember for sure at the moment, but it may be that this
correspondence is only important for the case of DISTINCT ON, in which
case we could dodge the problem by not applying the optimization unless
it's plain DISTINCT. That doesn't help us with point 1 though.

BTW, my dictionary says it's "dependent" not "dependant".

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2018-01-09 20:36:01 Re: Add %r substitution for psql prompts to show recovery status
Previous Message Alvaro Herrera 2018-01-09 20:19:00 Re: [HACKERS] [PATCH] Assert that the correct locks are held when calling PageGetLSN()