Quick Links

Query optimizer 8.0.1 (and 8.0)

From:	pgsql(at)mohawksoft(dot)com
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Query optimizer 8.0.1 (and 8.0)
Date:	2005-02-04 22:10:18
Message-ID:	16398.24.91.171.78.1107555018.squirrel@mail.mohawksoft.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Here's one:
I have the USA census TIGER database loaded, the WHOLE THING, the whole
country. It isn't the biggest database, but it is about 40G before
indexes. Every table is over a few million rows. I can quite honestly say,
a sequential scan is almost never the right thing to do.

Info below.

Here is the query:
select * from rt1, rt2 where rt1.tlid = rt2.tlid and (zipr = 2186 or
zipl=2186);

Well, when you look at the explain, obviously it looks like the sequential
scan is better, but trust me, it takes a good few minutes for the query to
respond with sequential scan enabled. I've run analyze and everything.

I suspect that analyze only samples a very small amount of the database
and gets the wrong idea about it. Is there a way to force analyze to
sample more rows?

tiger=# analyze verbose rt2 ;
INFO: analyzing "public.rt2"
INFO: "rt2": scanned 3000 of 1139825 pages, containing 84160 live rows
and 0 dead rows; 3000 rows in sample, 31975891 estimated total rows
ANALYZE
tiger=# analyze verbose rt1 ;
INFO: analyzing "public.rt1"
INFO: "rt1": scanned 3000 of 1527360 pages, containing 90951 live rows
and 0 dead rows; 3000 rows in sample, 46304973 estimated total rows
ANALYZE

tiger=# explain select * from rt1, rt2 where rt1.tlid = rt2.tlid and (zipr
= 2186 or zipl=2186);
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Hash Join (cost=121978.81..3996190.89 rows=21118 width=520)
Hash Cond: ("outer".tlid = "inner".tlid)
-> Seq Scan on rt2 (cost=0.00..1459291.36 rows=31946636 width=218)
-> Hash (cost=120662.36..120662.36 rows=30579 width=302)
-> Index Scan using rt1_zipr, rt1_zipl on rt1
(cost=0.00..120662.36 rows=30579 width=302)
Index Cond: ((zipr = 2186) OR (zipl = 2186))
(6 rows)

tiger=# set enable_seqscan=no;
SET
tiger=# explain select * from rt1, rt2 where rt1.tlid = rt2.tlid and (zipr
= 2186 or zipl=2186);
QUERY PLAN
-----------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..46868256.00 rows=21118 width=520)
-> Index Scan using rt1_zipr, rt1_zipl on rt1 (cost=0.00..120662.36
rows=30579 width=302)
Index Cond: ((zipr = 2186) OR (zipl = 2186))
-> Index Scan using rt2_tlid on rt2 (cost=0.00..1523.80 rows=396
width=218)
Index Cond: ("outer".tlid = rt2.tlid)

Responses

Re: Query optimizer 8.0.1 (and 8.0) at 2005-02-04 22:17:35 from Tom Lane
Re: Query optimizer 8.0.1 (and 8.0) at 2005-02-04 22:23:27 from Stephen Frost

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Philip Warner	2005-02-04 22:14:16	Re: Escaping the ARC patent
Previous Message	Tom Lane	2005-02-04 22:02:07	Re: Escaping the ARC patent