Re: An Idea for planner hints

From: Mark Dilger <pgsql(at)markdilger(dot)com>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: An Idea for planner hints
Date: 2006-08-24 18:44:43
Message-ID: 44EDF39B.5040407@markdilger.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Is there actually evidence that there's a lot of problems with bad join
> orders? ISTM that's one of the areas where the planner actually does a
> pretty good job.

I put together a quick demonstration using AxBxC where AxB is empty but AxC is
not. Sure enough, postgres chooses AxC first, then xB, which results in extra
work. This is a contrived example, but it would be a pain to try to post a real
example with all the data and analysis. I think it is fair to say that if it is
making the wrong choice in this example, it is sometimes making the wrong choice
in practice. Cross-table statistics are supposed to help avoid this, right?
But I think it would only help if the system had the statistics for AxB. I
think I have been hearing other people propose systems which would track which
joins the system is actually using and then recommend to the user that those
statistics be gathered. I think we need to go beyond that to recommending
statistics (or automatically gathering statistics, or whatever) for joins that
*might* be used given different plans than the one currently chosen by the planner.

test=# create table A (a integer);
CREATE TABLE
Time: 60.151 ms
test=# create table B (b integer);
CREATE TABLE
Time: 3.270 ms
test=# create table C (c integer);
CREATE TABLE
Time: 2.421 ms
test=# insert into A (a) (select * from generate_series(1,10000,2));
INSERT 0 5000
Time: 67.829 ms
test=# insert into B (b) (select * from generate_series(2,10000,2));
INSERT 0 5000
Time: 60.031 ms
test=# insert into C (c) (select * from generate_series(1,1000,2));
INSERT 0 500
Time: 6.303 ms
test=# analyze A;
ANALYZE
Time: 69.669 ms
test=# analyze B;
ANALYZE
Time: 24.548 ms
test=# analyze C;
ANALYZE
Time: 2.936 ms
test=# explain select * from A, B, C where A.a = B.b and A.a = C.c;
QUERY PLAN
---------------------------------------------------------------------------
Hash Join (cost=113.50..216.50 rows=500 width=12)
Hash Cond: ("outer".b = "inner".a)
-> Seq Scan on b (cost=0.00..73.00 rows=5000 width=4)
-> Hash (cost=112.25..112.25 rows=500 width=8)
-> Hash Join (cost=9.25..112.25 rows=500 width=8)
Hash Cond: ("outer".a = "inner".c)
-> Seq Scan on a (cost=0.00..73.00 rows=5000 width=4)
-> Hash (cost=8.00..8.00 rows=500 width=4)
-> Seq Scan on c (cost=0.00..8.00 rows=500 width=4)
(9 rows)

Time: 4.807 ms
test=# select * from A, B, C where A.a = B.b and A.a = C.c;
a | b | c
---+---+---
(0 rows)

Time: 34.561 ms
test=# select count(*) from A, C where A.a = C.c;
count
-------
500
(1 row)

Time: 8.450 ms
test=# select count(*) from A, B where A.a = B.b;
count
-------
0
(1 row)

Time: 33.757 ms

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2006-08-24 18:49:45 Re: PL/Perl: spi_prepare() and RETURNING
Previous Message Tom Lane 2006-08-24 18:41:44 Re: tsvector/tsearch equality and/or portability issue