From: | Mark Dilger <pgsql(at)markdilger(dot)com> |
---|---|
To: | "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com> |
Cc: | Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: An Idea for planner hints |
Date: | 2006-08-24 18:44:43 |
Message-ID: | 44EDF39B.5040407@markdilger.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> Is there actually evidence that there's a lot of problems with bad join
> orders? ISTM that's one of the areas where the planner actually does a
> pretty good job.
I put together a quick demonstration using AxBxC where AxB is empty but AxC is
not. Sure enough, postgres chooses AxC first, then xB, which results in extra
work. This is a contrived example, but it would be a pain to try to post a real
example with all the data and analysis. I think it is fair to say that if it is
making the wrong choice in this example, it is sometimes making the wrong choice
in practice. Cross-table statistics are supposed to help avoid this, right?
But I think it would only help if the system had the statistics for AxB. I
think I have been hearing other people propose systems which would track which
joins the system is actually using and then recommend to the user that those
statistics be gathered. I think we need to go beyond that to recommending
statistics (or automatically gathering statistics, or whatever) for joins that
*might* be used given different plans than the one currently chosen by the planner.
test=# create table A (a integer);
CREATE TABLE
Time: 60.151 ms
test=# create table B (b integer);
CREATE TABLE
Time: 3.270 ms
test=# create table C (c integer);
CREATE TABLE
Time: 2.421 ms
test=# insert into A (a) (select * from generate_series(1,10000,2));
INSERT 0 5000
Time: 67.829 ms
test=# insert into B (b) (select * from generate_series(2,10000,2));
INSERT 0 5000
Time: 60.031 ms
test=# insert into C (c) (select * from generate_series(1,1000,2));
INSERT 0 500
Time: 6.303 ms
test=# analyze A;
ANALYZE
Time: 69.669 ms
test=# analyze B;
ANALYZE
Time: 24.548 ms
test=# analyze C;
ANALYZE
Time: 2.936 ms
test=# explain select * from A, B, C where A.a = B.b and A.a = C.c;
QUERY PLAN
---------------------------------------------------------------------------
Hash Join (cost=113.50..216.50 rows=500 width=12)
Hash Cond: ("outer".b = "inner".a)
-> Seq Scan on b (cost=0.00..73.00 rows=5000 width=4)
-> Hash (cost=112.25..112.25 rows=500 width=8)
-> Hash Join (cost=9.25..112.25 rows=500 width=8)
Hash Cond: ("outer".a = "inner".c)
-> Seq Scan on a (cost=0.00..73.00 rows=5000 width=4)
-> Hash (cost=8.00..8.00 rows=500 width=4)
-> Seq Scan on c (cost=0.00..8.00 rows=500 width=4)
(9 rows)
Time: 4.807 ms
test=# select * from A, B, C where A.a = B.b and A.a = C.c;
a | b | c
---+---+---
(0 rows)
Time: 34.561 ms
test=# select count(*) from A, C where A.a = C.c;
count
-------
500
(1 row)
Time: 8.450 ms
test=# select count(*) from A, B where A.a = B.b;
count
-------
0
(1 row)
Time: 33.757 ms
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2006-08-24 18:49:45 | Re: PL/Perl: spi_prepare() and RETURNING |
Previous Message | Tom Lane | 2006-08-24 18:41:44 | Re: tsvector/tsearch equality and/or portability issue |