Optimizer isn't perfect

From: Greg Stark <gsstark(at)mit(dot)edu>
To: pgsql-general(at)postgresql(dot)org
Subject: Optimizer isn't perfect
Date: 2003-08-29 14:37:32
Message-ID: 87lltcil77.fsf@stark.dyndns.tv
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


Hm, here's a query where the optimizer is choosing the wrong plan by far. I
think it boils down to it guessing wrong on how selective an rtree index is,
which I guess would be hard to predict.

Except if it guesses wrong by assuming it isn't selective it would be maybe
50% slower doing lots of index lookups instead of a more efficient full table
scan and join. If it guesses wrong by assuming it'll be very selective as it
is in this case then it's 1000% slower doing repeated full table scans.

This is the same thing someone else pointed out a while ago. The consequences
of guessing wrong and favouring a full table scan instead of index lookups
with nested loops are simply much more severe than the reverse.

Even if the full table scan was faster with one dataset I might prefer to use
the slower index lookup and live with the lower performance but be able to
guarantee that a sudden shift in data distribution or user behaviour wouldn't
suddenly cause a complete outage. I'm wondering if it doesn't make sense to go
into production with enable_seqscan = off.

slo=> explain analyze
SELECT (select count(distinct xx_id) from xx_thing where thing_id in (select thing_id from thing where group_id = group.group_id)) as num_xxs
FROM group
WHERE geom2 @ make_box(-79.3885,43.6438,65)
AND earth_dist(geom, -79.3885,43.6438) < 65

;
slo-> slo-> slo-> slo-> slo-> slo->
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_group_geom on group (cost=0.00..130.62 rows=1 width=4) (actual time=69.53..7410.31 rows=125 loops=1)
Index Cond: (geom2 @ '(-78.5801566350508,44.2287532037437),(-80.1968433649492,43.0588467962563)'::box)
Filter: (sqrt((pow((80.4113732090646::double precision * (geom[0] - -79.3885::double precision)), 2::double precision) + pow((111.12::double precision * (geom[1] - 43.6438::double precision)), 2::double precision))) < 65::double precision)
SubPlan
-> Aggregate (cost=127.44..127.44 rows=1 width=4) (actual time=59.11..59.11 rows=1 loops=125)
-> Hash IN Join (cost=19.00..127.42 rows=7 width=4) (actual time=36.63..58.66 rows=11 loops=125)
Hash Cond: ("outer".thing_id = "inner".thing_id)
-> Seq Scan on xx_thing (cost=0.00..81.23 rows=5423 width=8) (actual time=0.03..32.64 rows=5423 loops=125)
-> Hash (cost=18.97..18.97 rows=15 width=4) (actual time=0.59..0.59 rows=0 loops=125)
-> Index Scan using idx_thing_group on thing (cost=0.00..18.97 rows=15 width=4) (actual time=0.21..0.42 rows=8 loops=125)
Index Cond: (group_id = $0)
Total runtime: 7424.11 msec
(12 rows)

slo=> slo=> set enable_seqscan = off;
SET

slo=> set enable_mergejoin = off;
SET

slo=> explain analyze
SELECT (select count(distinct xx_id) from xx_thing where thing_id in (select thing_id from thing where group_id = group.group_id)) as num_xxs
FROM group
WHERE geom2 @ make_box(-79.3885,43.6438,65)
AND earth_dist(geom, -79.3885,43.6438) < 65

;
slo-> slo-> slo-> slo-> slo-> slo-> QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_group_geom on group (cost=0.00..149.93 rows=1 width=4) (actual time=0.74..239.90 rows=125 loops=1)
Index Cond: (geom2 @ '(-78.5801566350508,44.2287532037437),(-80.1968433649492,43.0588467962563)'::box)
Filter: (sqrt((pow((80.4113732090646::double precision * (geom[0] - -79.3885::double precision)), 2::double precision) + pow((111.12::double precision * (geom[1] - 43.6438::double precision)), 2::double precision))) < 65::double precision)
SubPlan
-> Aggregate (cost=146.75..146.75 rows=1 width=4) (actual time=1.81..1.81 rows=1 loops=125)
-> Nested Loop (cost=19.00..146.73 rows=7 width=4) (actual time=0.65..1.60 rows=11 loops=125)
-> HashAggregate (cost=19.00..19.00 rows=15 width=4) (actual time=0.36..0.63 rows=8 loops=125)
-> Index Scan using idx_thing_group on thing (cost=0.00..18.97 rows=15 width=4) (actual time=0.09..0.22 rows=8 loops=125)
Index Cond: (group_id = $0)
-> Index Scan using idx_xx_thing_loc on xx_thing (cost=0.00..8.43 rows=7 width=8) (actual time=0.05..0.08 rows=1 loops=1055)
Index Cond: (xx_thing.thing_id = "outer".thing_id)
Total runtime: 252.60 msec
(12 rows)

slo=> >

--
greg

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jonathan Bartlett 2003-08-29 14:51:47 Re: left outer join terrible slow compared to inner join
Previous Message Gavani, Uma 2003-08-29 14:32:16 Problems connecting to the database -_mdfd_getrelnfd: cannot open relation pg_type: Permission denied