Re: Any optimizations to the join code in 7.1?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joel Burton <jburton(at)scw(dot)org>
Cc: "mascarm(at)mascari(dot)com" <mascarm(at)mascari(dot)com>, "'pgsql-hackers(at)postgresql(dot)org'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Any optimizations to the join code in 7.1?
Date: 2001-04-25 17:46:47
Message-ID: 29815.988220807@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Joel Burton <jburton(at)scw(dot)org> writes:
> 1) it appears (from my tests) that SELECT * FROM

> CREATE VIEW joined as
> SELECT p.id,
> p.pname,
> c.cname
> FROM p
> LEFT OUTER JOIN c using (id)

> gives the same answer as SELECT * FROM

> CREATE VIEW nested
> SELECT p.id,
> p.pname,
> (select c.cname from c where c.id = p.id)
> FROM p

Only if c.id is a unique column (ie, there are always 0 or 1 matches in
c for any given p.id). Otherwise the subselect form will fail.

> However, I often am writing VIEWs that will be used by developers
> in a front-end system. Usually, this view might have 30 items in the
> select clause, but the developer using it is likely to only as for
> four or five items. In this case, I often prefer the
> subquery form because it appears that
> SELECT id, pname FROM joined
> is more complicated than
> SELECT id, pname FROM nested
> as the first has to perform the join, and the second doesn't.

> Is this actually correct?

This approach is probably reasonable if the cname field of the view
result is seldom wanted at all, and never used as a WHERE constraint.
You'd get a very nonoptimal plan if someone did

select * from nested where cname like 'foo%'

since the planner has no way to use the LIKE constraint to limit the
rows fetched from p. In the JOIN format, on the other hand, I think
the constraint could be exploited.

Also bear in mind that the subselect form is essentially forcing the
join to be done via a nested loop. If you have an index on c.id then
this may not be too bad, but without one the performance will be
horrid. Even with an index, nested loop with inner indexscan is not
the join method of choice if you are retrieving a lot of rows.

> 2) The explicit-joins help suggests that manual structuring and
> experimentation might help -- has anyone written (or could
> anyone write) anthing about where to start in guessing what
> join order might be optimal?

The obvious starting point is the plan produced by the planner from an
unconstrained query. Even if you don't feel like trying to improve it,
you could cut the time to reproduce the plan quite a bit --- just CROSS
JOIN a few of the relation pairs that are joined first in the
unconstrained plan.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2001-04-25 18:28:18 Re: refusing connections based on load ...
Previous Message Tom Lane 2001-04-25 17:24:14 Re: refusing connections based on load ...