Re: Performance Optimization for Dummies 2 - the SQL

From: "Carlo Stonebanks" <stonec(dot)register(at)sympatico(dot)ca>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance Optimization for Dummies 2 - the SQL
Date: 2006-10-16 17:33:28
Message-ID: eh0fmq$mpp$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> Can you try temporarily disabling bitmap scans and see what comes up?

Well, that's slowing everything down. I've got a couple of results, below

1) Bitmap scan off, but seq scan enabled.
2) Bitmap scan and seq scan off
3) Bitmap scan back on, seq scan back on, and a new index created
4) VACUUM VERBOSE on the tables involved
5) Original SQL with original EXPLAIN to show the code that started this.

Carlo

1) Bitmap scan off, but seq scan enabled. It created a suprisingly expensive
seq scan.

"Nested Loop Left Join (cost=0.00..34572.43 rows=109 width=71) (actual
time=1536.827..1536.827 rows=0 loops=1)"
" Filter: ("inner".facility_address_id IS NULL)"
" -> Seq Scan on facility f (cost=0.00..34146.91 rows=109 width=71)
(actual time=621.100..1536.606 rows=7 loops=1)"
" Filter: ((default_country_code = 'US'::bpchar) AND
(((default_postal_code)::text = '14224-1945'::text) OR
((default_postal_code)::text = '14224'::text)))"
" -> Index Scan using facility_address_facility_address_address_type_idx
on facility_address fa (cost=0.00..3.89 rows=1 width=8) (actual
time=0.020..0.023 rows=1 loops=7)"
" Index Cond: (fa.facility_id = "outer".facility_id)"
"Total runtime: 1536.957 ms"

2) So I turned both bitmap scan and seq scan off - now we get index scans,
the performance is suprisingly horrible:

"Nested Loop Left Join (cost=0.00..39286.55 rows=109 width=71) (actual
time=3598.462..3598.462 rows=0 loops=1)"
" Filter: ("inner".facility_address_id IS NULL)"
" -> Index Scan using facility_pkey on facility f (cost=0.00..38861.03
rows=109 width=71) (actual time=1500.690..3598.201 rows=7 loops=1)"
" Filter: ((default_country_code = 'US'::bpchar) AND
(((default_postal_code)::text = '14224-1945'::text) OR
((default_postal_code)::text = '14224'::text)))"
" -> Index Scan using facility_address_facility_address_address_type_idx
on facility_address fa (cost=0.00..3.89 rows=1 width=8) (actual
time=0.024..0.027 rows=1 loops=7)"
" Index Cond: (fa.facility_id = "outer".facility_id)"
"Total runtime: 3598.600 ms"

3) So I turned bitmap scan back on, seq scan back on, and created an index
to EXPLICITLY to satisfy this condition. Iintuitivly, I thought that
combinations of other indexes should have satisfied the optimizer, but
figured better overkill than nothing. I thought this would solve it - but
no. We is using a BRAND NEW INDEX which is unlikely to be corrupt so
expensive?

"Nested Loop Left Join (cost=25300.96..26043.67 rows=110 width=71) (actual
time=1339.216..1339.216 rows=0 loops=1)"
" Filter: ("inner".facility_address_id IS NULL)"
" -> Bitmap Heap Scan on facility f (cost=25300.96..25614.42 rows=110
width=71) (actual time=1339.043..1339.066 rows=7 loops=1)"
" Recheck Cond: (((default_country_code = 'US'::bpchar) AND
((default_postal_code)::text = '14224-1945'::text)) OR
((default_country_code = 'US'::bpchar) AND ((default_postal_code)::text =
'14224'::text)))"
" -> BitmapOr (cost=25300.96..25300.96 rows=110 width=0) (actual
time=1339.027..1339.027 rows=0 loops=1)"
" -> Bitmap Index Scan on
facility_facility_country_state_postal_code_idx (cost=0.00..12650.48
rows=55 width=0) (actual time=796.146..796.146 rows=7 loops=1)"
" Index Cond: ((default_country_code = 'US'::bpchar) AND
((default_postal_code)::text = '14224-1945'::text))"
" -> Bitmap Index Scan on
facility_facility_country_state_postal_code_idx (cost=0.00..12650.48
rows=55 width=0) (actual time=542.873..542.873 rows=0 loops=1)"
" Index Cond: ((default_country_code = 'US'::bpchar) AND
((default_postal_code)::text = '14224'::text))"
" -> Index Scan using facility_address_facility_address_address_type_idx
on facility_address fa (cost=0.00..3.89 rows=1 width=8) (actual
time=0.014..0.016 rows=1 loops=7)"
" Index Cond: (fa.facility_id = "outer".facility_id)"
"Total runtime: 1339.354 ms"

4) VACUUM VERBOSE on the tables involved. Note how much more painful in
elapsed time it is to vacuum facility vs facility_address, even though the
number of rows is comparable:

INFO: vacuuming "mdx_core.facility"
INFO: index "facility_pkey" now contains 964123 row versions in 3682 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 0.03s/0.03u sec elapsed 0.18 sec.
INFO: index "facility_country_state_city_idx" now contains 964188 row
versions in 7664 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 0.25s/0.17u sec elapsed 84.14 sec.
INFO: index "facility_country_state_postal_code_idx" now contains 964412
row versions in 7689 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 0.42s/0.10u sec elapsed 137.12 sec.
INFO: index "facility_facility_country_state_city_idx" now contains 964493
row versions in 6420 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 0.17s/0.09u sec elapsed 2.23 sec.
INFO: index "facility_facility_country_state_postal_code_idx" now contains
964494 row versions in 6895 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 0.01s/0.01u sec elapsed 0.95 sec.
INFO: "facility": found 0 removable, 964123 nonremovable row versions in
17398 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.90s/0.57u sec elapsed 224.80 sec.
INFO: vacuuming "pg_toast.pg_toast_58570311"
INFO: index "pg_toast_58570311_index" now contains 0 row versions in 1
pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO: "pg_toast_58570311": found 0 removable, 0 nonremovable row versions
in 0 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.01 sec.

Query returned successfully with no result in 224903 ms.

INFO: vacuuming "mdx_core.facility_address"
INFO: index "facility_address_pkey" now contains 893157 row versions in
3411 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 0.17s/0.04u sec elapsed 11.10 sec.
INFO: index "facility_address_address_idx" now contains 893157 row versions
in 3164 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 0.07s/0.04u sec elapsed 0.61 sec.
INFO: index "facility_address_facility_address_address_type_idx" now
contains 893157 row versions in 3797 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 0.01s/0.00u sec elapsed 0.07 sec.
INFO: "facility_address": found 0 removable, 893139 nonremovable row
versions in 9210 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.26s/0.15u sec elapsed 12.14 sec.

Query returned successfully with no result in 12297 ms.

5) Here is the original query, plus original explain analyze. Number of rows
have increased since this was run, so the costs are lower, but still
significant:

/*
Find all facilities that do not have full address information
but do have default location information that indicates
its the facilitiy's US zip code.
NULL values cast as columns are placeholders to allow
this sub-query to be unioned with another subquery
that contains full address data
*/
select
f.facility_id,
null as facility_address_id,
null as address_id,
f.facility_type_code,
f.name,
null as address,
f.default_city as city,
f.default_state_code as state_code,
f.default_postal_code as postal_code,
f.default_country_code as country_code,
null as parsed_unit
from
mdx_core.facility as f
left outer join mdx_core.facility_address as fa
on fa.facility_id = f.facility_id
where
facility_address_id is null
and f.default_country_code = 'US'
and (f.default_postal_code = '14224-1945' or f.default_postal_code =
'14224')

"Nested Loop Left Join (cost=22966.70..23594.84 rows=93 width=71) (actual
time=662.075..662.075 rows=0 loops=1)"
" Filter: ("inner".facility_address_id IS NULL)"
" -> Bitmap Heap Scan on facility f (cost=22966.70..23231.79 rows=93
width=71) (actual time=661.907..661.929 rows=7 loops=1)"
" Recheck Cond: (((default_country_code = 'US'::bpchar) AND
((default_postal_code)::text = '14224-1945'::text)) OR
((default_country_code = 'US'::bpchar) AND ((default_postal_code)::text =
'14224'::text)))"
" -> BitmapOr (cost=22966.70..22966.70 rows=93 width=0) (actual
time=661.891..661.891 rows=0 loops=1)"
" -> Bitmap Index Scan on
facility_country_state_postal_code_idx (cost=0.00..11483.35 rows=47
width=0) (actual time=374.284..374.284 rows=7 loops=1)"
" Index Cond: ((default_country_code = 'US'::bpchar) AND
((default_postal_code)::text = '14224-1945'::text))"
" -> Bitmap Index Scan on
facility_country_state_postal_code_idx (cost=0.00..11483.35 rows=47
width=0) (actual time=287.599..287.599 rows=0 loops=1)"
" Index Cond: ((default_country_code = 'US'::bpchar) AND
((default_postal_code)::text = '14224'::text))"
" -> Index Scan using facility_address_facility_address_address_type_idx
on facility_address fa (cost=0.00..3.89 rows=1 width=8) (actual
time=0.014..0.016 rows=1 loops=7)"
" Index Cond: (fa.facility_id = "outer".facility_id)"
"Total runtime: 662.203 ms"
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Harald Armin Massa 2006-10-16 20:13:38 measuring shared memory usage on Windows
Previous Message Shaun Thomas 2006-10-16 17:00:01 Re: Hints proposal