Question about difference in performance of 2 queries on large table

From: Sean Shanny <shannyconsulting(at)earthlink(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Question about difference in performance of 2 queries on large table
Date: 2003-12-29 16:35:43
Message-ID: 3FF057DF.7000605@earthlink.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

To all,

The facts:

PostgreSQL 7.4.0 running on BSD 5.1 on Dell 2650 with 4GB RAM, 5 SCSI
drives in hardware RAID 0 configuration. Database size with indexes is
currently 122GB. Schema for the table in question is at the end of this
email. The DB has been vacuumed full and analyzed. Between 2 and 3
million records are added to the table in question each night. An
analyze is run on the entire DB after the data has been loaded each
night. There are no updates or deletes of records during the nightly
load, only insertions.

I am trying to understand why the performance between the two queries
below is so different. I am trying to find the count of all pages that
have a 'valid' content_key. -1 is our 'we don't have any content' key.
The first plan below has horrendous performance. we only get about 2%
CPU usage and iostat shows 3-5 MB/sec IO. The second plan runs at 30%
cpu and 15-30MB.sec IO.

Could someone shed some light on why the huge difference in
performance? Both are doing index scans plus a filter. We have no
content_keys below -1 at this time so the queries return the same results.

Thanks.

--sean

explain select count (distinct (persistent_cookie_key) ) from
f_pageviews where date_key between 305 and 334 and content_key > -1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Aggregate (cost=688770.29..688770.29 rows=1 width=4)
-> Index Scan using idx_pageviews_content on f_pageviews
(cost=0.00..645971.34 rows=17119580 width=4)
Index Cond: (content_key > -1)
Filter: ((date_key >= 305) AND (date_key <= 334))
(4 rows)

explain select count (distinct (persistent_cookie_key) ) from
f_pageviews where date_key between 305 and 334 and content_key <> -1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1365419.12..1365419.12 rows=1 width=4)
-> Index Scan using idx_pageviews_date_nov_2003 on f_pageviews
(cost=0.00..1322615.91 rows=17121284 width=4)
Index Cond: ((date_key >= 305) AND (date_key <= 334))
Filter: (content_key <> -1)
(4 rows)

\d f_pageviews
Table "public.f_pageviews"
Column | Type | Modifiers
------------------------+---------+-------------------------------------------------------------
id | integer | not null default
nextval('public.f_pageviews_id_seq'::text)
date_key | integer | not null
time_key | integer | not null
content_key | integer | not null
location_key | integer | not null
session_key | integer | not null
subscriber_key | text | not null
persistent_cookie_key | integer | not null
ip_key | integer | not null
referral_key | integer | not null
servlet_key | integer | not null
tracking_key | integer | not null
provider_key | text | not null
marketing_campaign_key | integer | not null
orig_airport | text | not null
dest_airport | text | not null
commerce_page | boolean | not null default false
job_control_number | integer | not null
sequenceid | integer | not null default 0
url_key | integer | not null
useragent_key | integer | not null
web_server_name | text | not null default 'Not Available'::text
cpc | integer | not null default 0
referring_servlet_key | integer | default 1
first_page_key | integer | default 1
newsletterid_key | text | not null default 'Not Available'::text
Indexes:
"f_pageviews_pkey" primary key, btree (id)
"idx_pageviews_content" btree (content_key)
"idx_pageviews_date_dec_2003" btree (date_key) WHERE ((date_key >=
335) AND (date_key <= 365))
"idx_pageviews_date_nov_2003" btree (date_key) WHERE ((date_key >=
304) AND (date_key <= 334))
"idx_pageviews_referring_servlet" btree (referring_servlet_key)
"idx_pageviews_servlet" btree (servlet_key)
"idx_pageviews_session" btree (session_key)

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2003-12-29 16:48:07 Re: Question about difference in performance of 2 queries on large table
Previous Message Tom Lane 2003-12-27 18:30:38 Re: What's faster?