Re: Discussion on missing optimizations

From: Nico Williams <nico(at)cryptonector(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Adam Brusselback <adambrusselback(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Discussion on missing optimizations
Date: 2017-10-07 20:40:08
Message-ID: 20171007204007.GB3093@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Oct 06, 2017 at 10:19:54PM -0400, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > On 2017-10-06 21:33:16 -0400, Adam Brusselback wrote:
> >> The article in question is here:
> >>
> > That's interesting.
> The impression I have in a quick scan is that probably hardly any of these
> are cases that any of the DB designers think are important in themselves.

That's true for some of those. But some of them might become important
when you start pushing WHERE constraints from outside into inner table
sources and subqueries, as dumb-looking constraints can simply appear
from pushing non-dumb-looking constraints.

More than the op optimizations would make a big difference for me:

- turning subqueries into joins

- turning ORs into UNIONs

It is easy enough to work around the lack of this optimization in
many cases, but it does make queries more verbose.

- pushing WHERE constraints from outer queries into the table source
queries (_including_ VIEWs)

- determining that some table in a query that had WHERE constraints
pushed into it... now has a very well-filled out lookup key,
therefore it's the one that should be the table source to start
the plan with, i.e., that it should be first in the outermost loop
of a nested loop join

For me these two would be huge wins. I have to resort to
functions with roughly the same body as views just so that I can
have the optimizer pick the correct plan. This causes a lot of
code duplication in my schemas.

- pushing WHERE constraints from outer queries into HAVING thence WHERE
constraints on GROUP BY queries where the outer constraints are on
columns used to GROUP BY

I find myself making two versions of views that do aggregation: one
that does not, and one that does. This allows me to use the
non-aggregating view in contexts where I need this optimization, but
then I have to re-code the aggregation at that layer. Again, lots of

These sorts of optimizations are huge.

> Rather, they fall out of more general optimization attempts, or not,
> depending on the optimization mechanisms in use in a particular DB.
> For example, reducing "WHERE 1=1" to "WHERE TRUE" and then to nothing
> comes out of a constant-subexpression-precalculation mechanism for us,
> whereas "WHERE column=column" doesn't fall to that approach. ISTM it
> would be really dumb to expend planner cycles looking specifically for
> that case, so I guess that DB2 et al are finding it as a side-effect of
> some more general optimization ... I wonder what that is?

If you can reduce the number of compilations / optimization passes for
statements, then spending more time in the optimizer is not a big deal.
So, when invoked via PREPARE I would say spending more cycles looking
for this sort of thing is OK, but in many other cases it's not.

Also, sometimes these cases crop up do to pushing constraints into VIEWs
and sub-queries. In those cases then constant sub-expression
elimination can be a win.

> (edit: a few minutes later, I seem to remember that equivclass.c has
> to do something special with the X=X case, so maybe it could do
> something else special instead, with little new overhead.)

I'd expect that column = column is not trivial to turn into TRUE, not
unless those columns are NOT NULLable.

> > 9. Unneeded Self JOIN
> > Can't remember discussions of this.
> I can't get very excited about that one either.
> In the end, what the article fails to consider is that all of these are
> tradeoffs, not unalloyed goods. If you spend planner cycles on every
> query to look for cases that only the most unabashedly brain-dead ORMs
> ever generate, you're not really doing your users a favor on balance.

I can't get very excited about this one either, though I do believe it
can arise as the author says, "when you build complex views and JOIN
them to each other". Maybe I'm not excited about it because I've not
needed it :)


In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-10-07 20:46:55 Re: Horrible CREATE DATABASE Performance in High Sierra
Previous Message Konstantin Knizhnik 2017-10-07 19:58:55 Re: Slow synchronous logical replication