Project proposal/comments please - query optimization

From: Kim Bisgaard <kib+pg(at)dmi(dot)dk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Project proposal/comments please - query optimization
Date: 2005-08-11 06:53:36
Message-ID: 42FAF5F0.3060700@dmi.dk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I have noticed a deficiency in the current query optimizer related to
"full outer joins". Tom Lane has confirmed to me that it will not be 8.1
material. I am not able to wait for 8.2

I am in the lucky situation that my project has money to hire
consultants, so I would be very interested in hearing about any who
feels able to work on this, with estimates to costs. The sw developed
shall be freely available and will be given back into PostgreSQL, if the
project wants it. I actually think it should be a requirement that the
sw is accepted into PostgreSQL, but I do not know how to phrase it so
that it is acceptable to all parties.

The specific problem can be illustrated with two example queries.
Query1:

SELECT x, y, av, bv
FROM at a
FULL OUTER JOIN bt b
USING (x, y)
WHERE x = 52981
AND y = '2004-1-1 0:0:0';

Query2:

SELECT x, y, av, bv
FROM
(SELECT x, y, av
FROM at
WHERE x = 52981 AND y = '2004-1-1 0:0:0') a
FULL OUTER JOIN
(SELECT x, y, bv
FROM bt
WHERE x = 52981 AND y = '2004-1-1 0:0:0') b
USING (x, y);

Both queries select the same set of data (one record), but query2 is
able to use the indexes in doing so. By looking at the "explain analyze"
output it is clear that this is because the current PostgreSQL query
optimizer is not able to push the conditions (x = 52981 AND y =
'2004-1-1 0:0:0') down into the sub-queries, thus forcing the fetching
of all data from the tables, and then lastly filtering out the few
records (zero to one row from each table).

The reason why I say it is related to "full outer joins" it that if I
take Query1 and substitute "full" with "left", the optimizer is capable
of pushing the conditions out in the sub-selects, and is thus able to
use indices.

Looking forward for any comments. I am aware that there are workarounds
(like query2, union of two left-joins, hand coding the join from a
series of simple selects, ...) but I do not feel they are practical for
my use.

Regards,

--
Kim Bisgaard

Computer Department Phone: +45 3915 7562 (direct)
Danish Meteorological Institute Fax: +45 3915 7460 (division)

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Soeren Laursen 2005-08-11 08:23:51 Re: Use of inv_getsize in functions
Previous Message Christopher Kings-Lynne 2005-08-11 02:33:12 SPARQL