Query optimizer 8.0.1 (and 8.0)

From: pgsql(at)mohawksoft(dot)com
To: pgsql-hackers(at)postgresql(dot)org
Subject: Query optimizer 8.0.1 (and 8.0)
Date: 2005-02-04 22:10:18
Message-ID: 16398.24.91.171.78.1107555018.squirrel@mail.mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here's one:
I have the USA census TIGER database loaded, the WHOLE THING, the whole
country. It isn't the biggest database, but it is about 40G before
indexes. Every table is over a few million rows. I can quite honestly say,
a sequential scan is almost never the right thing to do.

Info below.

Here is the query:
select * from rt1, rt2 where rt1.tlid = rt2.tlid and (zipr = 2186 or
zipl=2186);

Well, when you look at the explain, obviously it looks like the sequential
scan is better, but trust me, it takes a good few minutes for the query to
respond with sequential scan enabled. I've run analyze and everything.

I suspect that analyze only samples a very small amount of the database
and gets the wrong idea about it. Is there a way to force analyze to
sample more rows?

tiger=# analyze verbose rt2 ;
INFO: analyzing "public.rt2"
INFO: "rt2": scanned 3000 of 1139825 pages, containing 84160 live rows
and 0 dead rows; 3000 rows in sample, 31975891 estimated total rows
ANALYZE
tiger=# analyze verbose rt1 ;
INFO: analyzing "public.rt1"
INFO: "rt1": scanned 3000 of 1527360 pages, containing 90951 live rows
and 0 dead rows; 3000 rows in sample, 46304973 estimated total rows
ANALYZE

tiger=# explain select * from rt1, rt2 where rt1.tlid = rt2.tlid and (zipr
= 2186 or zipl=2186);
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Hash Join (cost=121978.81..3996190.89 rows=21118 width=520)
Hash Cond: ("outer".tlid = "inner".tlid)
-> Seq Scan on rt2 (cost=0.00..1459291.36 rows=31946636 width=218)
-> Hash (cost=120662.36..120662.36 rows=30579 width=302)
-> Index Scan using rt1_zipr, rt1_zipl on rt1
(cost=0.00..120662.36 rows=30579 width=302)
Index Cond: ((zipr = 2186) OR (zipl = 2186))
(6 rows)

tiger=# set enable_seqscan=no;
SET
tiger=# explain select * from rt1, rt2 where rt1.tlid = rt2.tlid and (zipr
= 2186 or zipl=2186);
QUERY PLAN
-----------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..46868256.00 rows=21118 width=520)
-> Index Scan using rt1_zipr, rt1_zipl on rt1 (cost=0.00..120662.36
rows=30579 width=302)
Index Cond: ((zipr = 2186) OR (zipl = 2186))
-> Index Scan using rt2_tlid on rt2 (cost=0.00..1523.80 rows=396
width=218)
Index Cond: ("outer".tlid = rt2.tlid)

Table "public.rt1"
Column | Type | Modifiers
-----------+-------------------+-----------
tlid | integer |
side1 | character varying |
source | character varying |
fedirp | character varying |
fename | character varying |
fetype | character varying |
fedirs | character varying |
cfcc | character varying |
fraddl | character varying |
toaddl | character varying |
fraddr | character varying |
toaddr | character varying |
friaddl | character varying |
toiaddl | character varying |
friaddr | character varying |
toiaddr | character varying |
zipl | integer |
zipr | integer |
aianhhfpl | character varying |
aianhhfpr | character varying |
aihhtlil | character varying |
aihhtlir | character varying |
statel | character varying |
stater | character varying |
countyl | character varying |
countyr | character varying |
cousubl | character varying |
cousubr | character varying |
submcdl | character varying |
submcdr | character varying |
placel | character varying |
placer | character varying |
tractl | character varying |
tractr | character varying |
blockl | character varying |
blockr | character varying |
frlong | numeric |
frlat | numeric |
tolong | numeric |
tolat | numeric |
Indexes:
"rt1_fename" btree (fename)
"rt1_frlat" btree (frlat)
"rt1_frlong" btree (frlong)
"rt1_tlid" btree (tlid)
"rt1_tolat" btree (tolat)
"rt1_tolong" btree (tolong)
"rt1_zipl" btree (zipl)
"rt1_zipr" btree (zipr)

Table "public.rt2"
Column | Type | Modifiers
--------+---------+-----------
tlid | integer |
rtsq | integer |
long1 | numeric |
lat1 | numeric |
long2 | numeric |
lat2 | numeric |
long3 | numeric |
lat3 | numeric |
long4 | numeric |
lat4 | numeric |
long5 | numeric |
lat5 | numeric |
long6 | numeric |
lat6 | numeric |
long7 | numeric |
lat7 | numeric |
long8 | numeric |
lat8 | numeric |
long9 | numeric |
lat9 | numeric |
long10 | numeric |
lat10 | numeric |
Indexes:
"rt2_tlid" btree (tlid)

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Philip Warner 2005-02-04 22:14:16 Re: Escaping the ARC patent
Previous Message Tom Lane 2005-02-04 22:02:07 Re: Escaping the ARC patent