Re: Discussion on missing optimizations

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Adam Brusselback <adambrusselback(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Discussion on missing optimizations
Date: 2017-10-07 15:15:13
Message-ID: CAKJS1f8XmPBe+ZLQco_VKpKqPwbkGicqyLYXSo=zqW+etqOmYQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7 October 2017 at 15:19, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> 9. Unneeded Self JOIN
>
>> Can't remember discussions of this.
>
> I can't get very excited about that one either.
>
> In the end, what the article fails to consider is that all of these are
> tradeoffs, not unalloyed goods. If you spend planner cycles on every
> query to look for cases that only the most unabashedly brain-dead ORMs
> ever generate, you're not really doing your users a favor on balance.

I think that final sentence lacks imagination.

I've seen plenty of views being called where some column is
unavailable, but the caller joins the very same table again on the
primary key to add the column. There was no brain-dead ORM involved,
just perhaps questionable design. This was very common in my last job
where we had some rats nest of views several layers deep, the core of
which often had some complex logic that nobody dared to try and
replicate.

It would be fairly cheap to check if any of the rtekind==RTE_RELATION
joinlist items have above 1 RangeTblEntry with the same relid. The
joinlist is never that big anyway, and if it was the join search would
be slow. The more expensive part would be to build the join clauses,
check if the expressions on either side of all OpExpr matches and that
nothing else will cause a non-match, then perform the uniqueness check
on those OpExpr operands. We do have some infrastructure to do the
unique checks. Likely the slowdown in planning would be just for
cases with a legitimately useful self-join, I doubt checking for a
duplicate RangeTblEntry->relid would cause much of a concern.

Anyway, this is giving me some feeling of Deja vu.. Perhaps we need
some pg_stats view that shows us planning time and execution time so
that we can get a better idea on how much these things matter in the
average case. We tend to never fare so well in these sorts of
comparisons with commercial databases. It's not hard to imagine
someone with migration plans loading some rats nest of views into
Postgres and taking it for a spin and finding performance is not quite
what they need. It's a bit sad that often the people with the loudest
voices are always so fast to stomp on the ideas for improvements. It
would be much nicer if you'd at least wait for benchmarks before
shooting.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-10-07 16:15:30 Re: Discussion on missing optimizations
Previous Message Tom Lane 2017-10-07 14:23:08 Re: separate serial_schedule useful?