Re: Removing unneeded self joins

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Removing unneeded self joins
Date: 2018-05-16 20:04:14
Message-ID: CANP8+jLFEQBKRz9q1c2yWK9so4K44Z24sQacoZrCew9NZhcf1A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 16 May 2018 at 11:26, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, May 16, 2018 at 12:08 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru> writes:
>>> There is a join optimization we don't do -- removing inner join of a
>>> table with itself on a unique column. Such joins are generated by
>>> various ORMs, so from time to time our customers ask us to look into
>>> this. Most recently, it was discussed on the list in relation to an
>>> article comparing the optimizations that some DBMS make [1].
>>
>> This is the sort of thing that I always wonder why the customers don't
>> ask the ORM to stop generating such damfool queries. Its *expensive*
>> for us to clean up after their stupidity; almost certainly, it would
>> take far fewer cycles, net, for them to be a bit smarter in the first
>> place.
>
> The trouble, of course, is that the customer didn't write the ORM,
> likely has no idea how it works, and doesn't want to run a modified
> version of it even if they do. If the queries run faster on other
> systems than they do on PostgreSQL, we get dinged -- not unjustly.
>
> Also, I'm not sure that I believe that it's always easy to avoid
> generating such queries. I mean, this case is trivial so it's easy to
> say, well, just rewrite the query. But suppose that I have a fact
> table over which I've created two views, each of which performs
> various joins between the fact table and various lookup tables. My
> queries are such that I normally need the joins in just one of these
> two views and not the other to fetch the information I care about.
> But every once in a while I need to run a report that involves pulling
> every column possible. The obvious solution is to join the views on
> the underlying table's primary key, but then you get this problem. Of
> course there's a workaround: define a third view that does both sets
> of joins-to-lookup-tables. But that starts to feel like you're
> handholding the database; surely it's the database's job to optimize
> queries, not the user's.
>
> It's been about 10 years since I worked as a web developer, but I do
> remember hitting this kind of problem from time to time and I'd really
> like to see us do something about it. I wish we could optimize away
> inner joins, too, for similar reasons.

I agree with everything you say.

What I would add is that I've seen cases where the extra joins do NOT
hurt performance, so the extra CPU used to remove the join hurts more
than the benefit of removing it. Yes, we tried it.

More advanced optimizations should only be applied when we've assessed
that the likely run time is high enough to make it worth investing in
further optimization.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-05-16 20:10:45 Re: Removing unneeded self joins
Previous Message Andres Freund 2018-05-16 19:41:37 Re: Memory unit GUC range checks