From: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [HACKERS] Removing useless DISTINCT clauses |
Date: | 2018-01-09 22:12:17 |
Message-ID: | CAKJS1f8=KepAZ-UOYnqRfAdwOKWEvt=3vbe4_3QBCmQuHbYKmg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Tom,
Thanks for looking at this.
On 10 January 2018 at 09:26, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> This is a cute idea, but I'm troubled by a couple of points:
>
> 1. Once you don't have all the tlist items shown in DISTINCT, it really is
> more like DISTINCT ON, seems like. I am not sure it's a good idea to set
> hasDistinctOn, because that engages some planner behaviors we probably
> don't want, but I'm also not sure we can get away with just ignoring the
> difference. As an example, in allpaths.c there are assorted assumptions
> that having a distinctClause but !hasDistinctOn means all output columns
> of a subquery are listed in the distinctClause.
hmm, I see:
/*
* If subquery has regular DISTINCT (not DISTINCT ON), we're wasting our
* time: all its output columns must be used in the distinctClause.
*/
if (subquery->distinctClause && !subquery->hasDistinctOn)
return;
I think that particular one would just fail to remove columns from the
subquery if there's a distinct clause (not distinct on). That seems
more like a missed optimisation rather than a future bug. Although it
probably would be a good idea to get rid of the assumption that a
non-NIL distinctClause && !hasDistinctOn means that all output target
items are part of that distinct clause.
I don't see anything else in allpaths.c that would be affected.
Keeping in mind we'll never completely remove a distinctClause, just
possibly remove some items from the list. Everything else just seems
to be checking distinctClause != NIL, or is only doing something if
hasDistinctOn == true.
I'll do some more analysis on places that distinctClause is being used
to check what's safe.
> 2. There's a comment in planner.c to the effect that
>
> * When we have DISTINCT ON, we must sort by the more rigorous of
> * DISTINCT and ORDER BY, else it won't have the desired behavior.
> * Also, if we do have to do an explicit sort, we might as well use
> * the more rigorous ordering to avoid a second sort later. (Note
> * that the parser will have ensured that one clause is a prefix of
> * the other.)
>
> Removing random elements of the distinctClause will break its
> correspondence with the sortClause, with probably bad results.
>
> I do not remember for sure at the moment, but it may be that this
> correspondence is only important for the case of DISTINCT ON, in which
> case we could dodge the problem by not applying the optimization unless
> it's plain DISTINCT. That doesn't help us with point 1 though.
Yeah, it seems better to only try and apply this for plain DISTINCT.
> BTW, my dictionary says it's "dependent" not "dependant".
Oops. Thanks.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2018-01-09 22:14:27 | Re: [HACKERS] Removing LEFT JOINs in more cases |
Previous Message | Tom Lane | 2018-01-09 22:05:49 |