Re: Discussion on missing optimizations

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Adam Brusselback <adambrusselback(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Discussion on missing optimizations
Date: 2017-10-07 02:38:55
Message-ID: 20171007023855.dl3uhfskmhi2wtad@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017-10-06 22:19:54 -0400, Tom Lane wrote:
> The impression I have in a quick scan is that probably hardly any of these
> are cases that any of the DB designers think are important in
> themselves.

> In the end, what the article fails to consider is that all of these are
> tradeoffs, not unalloyed goods. If you spend planner cycles on every
> query to look for cases that only the most unabashedly brain-dead ORMs
> ever generate, you're not really doing your users a favor on balance.

Partially agreed. A comment to the article also mentions that some other
database performs more optimizations depending on the cost of the
plan. That's not easy to do in our current plan structure, but I think
it's quite a worthwhile concept.

> Rather, they fall out of more general optimization attempts, or not,
> depending on the optimization mechanisms in use in a particular DB.
> For example, reducing "WHERE 1=1" to "WHERE TRUE" and then to nothing
> comes out of a constant-subexpression-precalculation mechanism for us,
> whereas "WHERE column=column" doesn't fall to that approach. ISTM it
> would be really dumb to expend planner cycles looking specifically for
> that case, so I guess that DB2 et al are finding it as a side-effect of
> some more general optimization ... I wonder what that is?
>
> (edit: a few minutes later, I seem to remember that equivclass.c has
> to do something special with the X=X case, so maybe it could do
> something else special instead, with little new overhead.)

Yea, I think this should be inferrable from information we essentially
already compute for equivclasses.

> > (i.e. combining a IN (a,b,c) and a IN (c,d,f) into a IN (c), similar
> > with OR)
>
> > I can't remember proposals about adding this, but it seems worthwhile to
> > consider. I think we should be able to check for this without a lot of
> > planner overhead.
>
> It would be enormously expensive to check that in the general case with
> a bunch of entries in each IN list. Perhaps it would be OK to add on
> the presumption that few queries would contain multiple IN's on the same
> column in the first place, though. Not sure.

Merging of ORs should be near free if you leave duplicates in there, and
the duplicates aren't a huge concern if your alternative is is to have
them, but also have two clauses to evaluate.

I think the IN AND IN case should usually end up a clear win as
well. Both lists are going to be evaluated for each row anyway - you
don't need a lot of rows where clauses are evaluated to make it worth
the O(n*log(n)) of sorting the lists. Sorting them would be beneficial
for other reasons as well, e.g. it improves access patterns for SAO
index scans.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2017-10-07 03:44:00 Re: [PATCH] A hook for session start
Previous Message Tom Lane 2017-10-07 02:19:54 Re: Discussion on missing optimizations