Re: Eliminating unnecessary left joins

From: "Nicolas Barbier" <nicolas(dot)barbier(at)gmail(dot)com>
To: Ottó Havasvölgyi <havasvolgyi(dot)otto(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Eliminating unnecessary left joins
Date: 2007-04-08 12:02:26
Message-ID: b0f3f5a10704080502r25e85476o483dbbc403cc31ff@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2007/4/7, Ottó Havasvölgyi <havasvolgyi(dot)otto(at)gmail(dot)com>:

> My simple example:
>
> Class hierarchy and fields:
> Shape (ID, X, Y)
> |
> +-Circle (ID, Radius)
> |
> +-Rectangle (ID, Width, Height)
>
> The mapper creates 3 tables with the columns next to the class name.
> And it creates 3 views. One of them:
>
> RectangleView: SELECT r."ID" as "ID", s."X" as "X", s."Y" as "Y", r."Width"
> as "Width", r."Height" as "Height" FROM "Rectangle" r LEFT JOIN "Shape" s ON
> ( r.ID=s.ID)

I find this view definition a bit strange: why is there a left outer
join? I expect there to be a FK from Rectangle.ID to Shape.ID ("all
rectangles are shapes"), which makes the definition totally equivalent
with one in which a normal join is used (whether attributes of Shape
are used or not).

The main use case I see for the original optimization is ORMs that
join in a whole hierarchy, even when only a part of it is needed. I
guess that that is rather common. The ORM that I use does exactly
this, because the main target-DBMSs (MS-SQL and Oracle) do the
optimization for it.

Example (somewhat less contrived than my previous one):

Imagine an implementation of the typical "books that are borrowed by
people" n-m relationship, using three tables ("Book", "Borrowed",
"Person"). Let's find all books that have been borrowed by a certain
person.

The "non-ORM" version would be something like:

SELECT Book.*
FROM
Book
JOIN Borrowed ON Borrowed.book_id = Book.id
WHERE Borrowed.person_id = <x>;

Now assume that Borrowed is a class hierarchy mapped into multiple
tables by a typical ORM. The query would probably become something
like:

SELECT Book.*
FROM
Book
JOIN Borrowed_Parent ON Borrowed_Parent.book_id = Book.id
LEFT JOIN Borrowed_Child1 ON Borrowed_Child1.id = Borrowed_Parent.id
LEFT JOIN Borrowed_Child2 ON Borrowed_Child2.id = Borrowed_Parent.id
(...)
WHERE Borrowed_Parent.person_id = <x>;

It is clear that the children of the hierarchy are needlessly joined
in (as the only attribute that is actually needed is person_id, which
is on the parent level). It is not always trivial for the ORM to find
that out, without writing stuff that looks suspiciously similar to a
DBMS optimizer.

Maybe it is debatable whether this optimization should be done by the
application (i.e. the ORM) or by the DBMS. I am personally in favor of
doing it in the DBMS.

greetings,
Nicolas

--
Nicolas Barbier
http://www.gnu.org/philosophy/no-word-attachments.html

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-04-08 15:05:37 Re: Reviewers Guide to Deferred Transactions/TransactionGuarantee
Previous Message Simon Riggs 2007-04-08 08:46:45 Re: Reviewers Guide to Deferred Transactions/TransactionGuarantee