Suggestions wanted for 7.2.4 query

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: PgSQL Performance ML <pgsql-performance(at)postgresql(dot)org>
Subject: Suggestions wanted for 7.2.4 query
Date: 2003-05-04 02:28:52
Message-ID: 200305031928.52906.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Folks,

I have a common query on a production database that's running a little too
slow (3-6 seconds). I can currently drop the time to 0.8 seconds by
setting enable_seqscan = false; the main reason is the planner poorly
deciding to use a seq scan for the hash join between "events" and "cases",
mostly due to a dramatically bad estimate of the number of rows required from
"cases".

Suggestions on how to get Postgres to use cases_pkey instead of a seq scan on
cases without messing with global query settings in the database which might
make other queries run slower? (And yes, a VACUUM FULL ANALYZE was
involved).

The View:
create view sv_events as
select events.event_id, events.status, status_label, status.rollup as rstatus,
events.etype_id, type_name,
event_cats.ecat_id, cat_name, events.event_date, events.event_name,
jw_date_format(events.event_date, events.event_tz, events.duration) as
show_date,
cases.case_name || '(' || cases.docket || ')' as event_case,
events.case_id, cases.case_name, cases.docket, NULL::VARCHAR as tgroup_name,
events.location_id, location_name, locations.zip_code, locations.address,
locations.state_code, locations.city,
lu.user_name as lock_name, lu.email as lock_email, lu.user_id AS lock_user
FROM status, locations, event_types, event_cats, cases,
events LEFT OUTER JOIN lock_users lu ON events.event_id = lock_record
WHERE events.status <> 0
AND (events.status = status.status AND status.relation = 'events')
AND events.location_id = locations.location_id
AND event_types.etype_id = events.etype_id
AND event_cats.ecat_id = event_types.ecat_id
AND events.case_id = cases.case_id;

The Query:
SELECT sv_events.*, FALSE AS fuzzy_team FROM sv_events WHERE EXISTS ( SELECT
event_id FROM event_days
WHERE event_days.event_id = sv_events.event_id AND (event_day BETWEEN
('2003-04-08'::TIMESTAMP WITHOUT TIME ZONE)
AND ('2003-06-17 23:59'::TIMESTAMP WITHOUT TIME ZONE) ) );

The Explain:
jwnet_test=> \i perform.sql
psql:perform.sql:9: NOTICE: QUERY PLAN:

Limit (cost=199572.58..199572.58 rows=10 width=368) (actual
time=3239.95..3239.96 rows=10 loops=1)
-> Sort (cost=199572.58..199572.58 rows=33575 width=368) (actual
time=3239.92..3239.93 rows=41 loops=1)
-> Hash Join (cost=6576.62..191013.53 rows=33575 width=368) (actual
time=513.49..3220.38 rows=1790 loops=1)
-> Hash Join (cost=6574.72..189924.26 rows=14837 width=350)
(actual time=509.20..3063.85 rows=1790 loops=1)
-> Hash Join (cost=38.81..180804.32 rows=14837
width=304) (actual time=16.38..452.80 rows=1919 loops=1)
-> Hash Join (cost=33.92..180539.78 rows=14837
width=252) (actual time=15.68..428.38 rows=1919 loops=1)
-> Hash Join (cost=22.17..180231.28
rows=14837 width=155) (actual time=13.98..406.61 rows=1919 loops=1)
-> Seq Scan on events
(cost=0.00..179874.82 rows=14837 width=67) (actual time=0.27..382.47
rows=1919 loops=1)
SubPlan
-> Index Scan using
event_days_pk on event_days (cost=0.00..6.01 rows=1 width=4) (actual
time=0.01..0.01 rows=0 loops=29734)
-> Hash (cost=21.99..21.99 rows=72
width=83) (actual time=13.66..13.66 rows=0 loops=1)
-> Subquery Scan lu
(cost=12.61..21.99 rows=72 width=83) (actual time=13.64..13.65 rows=1
loops=1)
-> Hash Join
(cost=12.61..21.99 rows=72 width=83) (actual time=13.63..13.64 rows=1
loops=1)
-> Seq Scan on
edit_locks (cost=0.00..7.94 rows=72 width=26) (actual time=12.82..12.83
rows=1 loops=1)
-> Hash
(cost=6.50..6.50 rows=150 width=57) (actual time=0.71..0.71 rows=0 loops=1)
-> Seq Scan on
users (cost=0.00..6.50 rows=150 width=57) (actual time=0.01..0.47 rows=150
loops=1)
-> Hash (cost=11.00..11.00 rows=300
width=97) (actual time=1.66..1.66 rows=0 loops=1)
-> Seq Scan on locations
(cost=0.00..11.00 rows=300 width=97) (actual time=0.01..1.11 rows=300
loops=1)
-> Hash (cost=4.75..4.75 rows=56 width=52) (actual
time=0.60..0.60 rows=0 loops=1)
-> Hash Join (cost=1.21..4.75 rows=56
width=52) (actual time=0.17..0.51 rows=56 loops=1)
-> Seq Scan on event_types
(cost=0.00..2.56 rows=56 width=31) (actual time=0.01..0.15 rows=56 loops=1)
-> Hash (cost=1.17..1.17 rows=17
width=21) (actual time=0.07..0.07 rows=0 loops=1)
-> Seq Scan on event_cats
(cost=0.00..1.17 rows=17 width=21) (actual time=0.01..0.05 rows=17 loops=1)
-> Hash (cost=3800.07..3800.07 rows=112107 width=46)
(actual time=491.84..491.84 rows=0 loops=1)
-> Seq Scan on cases (cost=0.00..3800.07
rows=112107 width=46) (actual time=0.01..277.20 rows=112107 loops=1)
-> Hash (cost=1.88..1.88 rows=10 width=18) (actual
time=0.12..0.12 rows=0 loops=1)
-> Seq Scan on status (cost=0.00..1.88 rows=10 width=18)
(actual time=0.03..0.11 rows=10 loops=1)
Total runtime: 3241.09 msec

The Index Scan:
jwnet_test=> set enable_seqscan = false;
SET VARIABLE
jwnet_test=> \i perform.sql
psql:perform.sql:9: NOTICE: QUERY PLAN:

Limit (cost=252608.52..252608.52 rows=10 width=368) (actual
time=740.62..740.64 rows=10 loops=1)
-> Sort (cost=252608.52..252608.52 rows=33469 width=368) (actual
time=740.60..740.61 rows=41 loops=1)
-> Hash Join (cost=86.85..244083.21 rows=33469 width=368) (actual
time=20.93..720.70 rows=1790 loops=1)
-> Hash Join (cost=80.75..242992.18 rows=14812 width=350)
(actual time=16.69..554.62 rows=1790 loops=1)
-> Nested Loop (cost=49.20..242664.38 rows=14812
width=253) (actual time=14.56..519.42 rows=1790 loops=1)
-> Hash Join (cost=49.20..158631.12 rows=14812
width=207) (actual time=14.40..459.91 rows=1919 loops=1)
-> Hash Join (cost=32.78..158355.48
rows=14812 width=155) (actual time=13.59..442.08 rows=1919 loops=1)
-> Index Scan using idx_events_status
on events (cost=0.00..157988.97 rows=14812 width=67) (actual
time=0.08..416.67 rows=1919 loops=1)
SubPlan
-> Index Scan using
event_days_pk on event_days (cost=0.00..5.26 rows=1 width=4) (actual
time=0.01..0.01 rows=0 loops=29734)
-> Hash (cost=32.60..32.60 rows=72
width=83) (actual time=13.47..13.47 rows=0 loops=1)
-> Subquery Scan lu
(cost=0.00..32.60 rows=72 width=83) (actual time=1.60..13.46 rows=1 loops=1)
-> Merge Join
(cost=0.00..32.60 rows=72 width=83) (actual time=1.59..13.45 rows=1 loops=1)
-> Index Scan using
users_pkey on users (cost=0.00..19.63 rows=150 width=57) (actual
time=0.09..0.12 rows=3 loops=1)
-> Index Scan using
edit_locks_user_id on edit_locks (cost=0.00..11.51 rows=72 width=26) (actual
time=1.43..13.28 rows=1 loops=1)
-> Hash (cost=16.28..16.28 rows=56 width=52)
(actual time=0.77..0.77 rows=0 loops=1)
-> Hash Join (cost=5.67..16.28 rows=56
width=52) (actual time=0.29..0.68 rows=56 loops=1)
-> Index Scan using
event_types_pkey on event_types (cost=0.00..9.63 rows=56 width=31) (actual
time=0.08..0.28 rows=56 loops=1)
-> Hash (cost=5.63..5.63 rows=17
width=21) (actual time=0.15..0.15 rows=0 loops=1)
-> Index Scan using
event_cats_pkey on event_cats (cost=0.00..5.63 rows=17 width=21) (actual
time=0.08..0.13 rows=17 loops=1)
-> Index Scan using cases_pkey on cases
(cost=0.00..5.66 rows=1 width=46) (actual time=0.02..0.02 rows=1 loops=1919)
-> Hash (cost=30.80..30.80 rows=300 width=97) (actual
time=2.07..2.07 rows=0 loops=1)
-> Index Scan using locations_pkey on locations
(cost=0.00..30.80 rows=300 width=97) (actual time=0.09..1.61 rows=300
loops=1)
-> Hash (cost=6.07..6.07 rows=10 width=18) (actual
time=0.08..0.08 rows=0 loops=1)
-> Index Scan using status_relation on status
(cost=0.00..6.07 rows=10 width=18) (actual time=0.03..0.06 rows=10 loops=1)
Total runtime: 741.72 msec

--
-Josh Berkus
Aglio Database Solutions
San Francisco

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Christopher Kings-Lynne 2003-05-04 04:05:42 Re: NOT IN doesn't use index? (fwd)
Previous Message Rod Taylor 2003-05-03 21:16:41 Re: NOT IN doesn't use index? (fwd)