Re: Planner creating ineffective plans on LEFT OUTER joins

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Planner creating ineffective plans on LEFT OUTER joins
Date: 2008-06-27 01:21:47
Message-ID: 200806270321.54235.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Thursday 26 June 2008 04:36:09 Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > SELECT *
> > FROM
> > ab LEFT OUTER JOIN (
> > bc JOIN cd
> > ON bc.c = cd.d
> > )
> > ON ab.b = bc.b
> >
> > WHERE
> > ab.a = 20000
> >
> > As ab.a = 20000 occurs only once in ab one would expect that it just does
> > an index scan on bc for ab.b = bc.b.
There was a typo in here (ON bc.c = cd.d should be ON bc.c = cd.c):
http://anarazel.de/postgres/testtable_query4.plan
Better query plan, but it still not optimal - interestingly the query plan
works out perfecty for ab.a = 10:
http://anarazel.de/postgres/testtable_query3.plan
....

> The only way it could do that would be by interchanging the order of the
> left and inner joins, ie (ab left join bc) join cd; which would change
> the results.
My knowledge about the implementation side of relational databases is quite
limited, so my ideas may be quite flawed:
The planner already recognizes that the left side of the join is quite small
and the right side will be very big.
Why cant it optimize the query the same way it does for a inner join, namely
doing an index lookup on bc?
I dont see the fundamental problem?

> I believe it could interchange the joins if they were both LEFT or
> both INNER. Do you really need exactly these semantics?
I don't see an easy/effective way to express it:
I need all data belonging left side of the join (proband) through a series
(participation -> answer_group -> answer -> data) of
inner joins and NULL if there is no data.
If there would be only one such join it wouldn't be a problem - but a normal
query has around 20 such LEFT JOINS.
Currently I solve this through separately inserting the data for each join
into a temporary table which is still way much faster. But not having the
statistics the planner has selecting a good order isn't that easy. Besides its
not very elegant.
So, if somebody has a better idea...

If I can use my time to improve pg instead of working around the problem on
clientside both me and my employer will be happy...

Thanks,

Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2008-06-27 02:38:45 Re: pgsql: Make the pg_stat_activity view call a SRF
Previous Message Tom Lane 2008-06-27 00:37:19 Re: CVS Head psql bug?