Re: Removing useless DISTINCT clauses

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Removing useless DISTINCT clauses
Date: 2018-08-24 02:12:14
Message-ID: 20180824021214.GJ3326@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> Stephen Frost <sfrost(at)snowman(dot)net> writes:
> > * David Rowley (david(dot)rowley(at)2ndquadrant(dot)com) wrote:
> >> On 24 August 2018 at 11:34, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> >>> * David Rowley (david(dot)rowley(at)2ndquadrant(dot)com) wrote:
> >>>> My personal opinion of only being able to completely remove the
> >>>> DISTINCT when there's a single item in the rtable (or a single base
> >>>> table) is that it's just too poor to bother with.
>
> > Hm, so you're suggesting that this isn't the right place for this
> > optimization to be implemented, even now, with the single-relation
> > caveat?
>
> There is no case where planner optimizations should depend on the length
> of the rtable. Full stop.
>
> It could make sense to optimize if there is just one baserel in the join
> tree --- although even that is best checked only after join removal.

Hm, that's certainly a fair point.

> As an example of the difference, such an optimization should be able to
> optimize "select * from view" if the view contains just one base table.
> The rtable will list both the view and the base table, but the view
> is only hanging around for permissions-checking purposes; it should not
> affect the planner's behavior.

This is happening at the same time as some optimizations around GROUP
BY, so either there's something different about what's happening there
and I didn't appreciate it, or does that optimization suffer from a
similar issue?

> I've not read the patch, but David's reaction makes it sound like its
> processing is done too early. There are right places and wrong places
> to do most everything in the planner, and I do not wish to accept a
> patch that does something in the wrong place.

Right, I definitely agree with you there. This seemed like a reasonable
place given the similar optimization (at least in appearance to me)
being done there for the GROUP BY case. I'm happy to admit that I
haven't looked at it in very much depth (hence my question to David) and
I'm not an expert in this area, but I did want to bring up that the
general idea and the relative trade-offs at least sounded reasonable.

I'll also note that I didn't see these concerned raised earlier on the
thread when I re-read your remarks on it, so I'm a bit concerned that
perhaps either this isn't an actual concern to be realized or perhaps it
was missed previously.

Thanks!

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-08-24 02:15:27 Re: Improve behavior of concurrent ANALYZE/VACUUM
Previous Message Tom Lane 2018-08-24 02:03:36 Re: Removing useless DISTINCT clauses