PG7.4.5: query not using index on date column

From: Dave Steinberg <dave-dated-1101824919(dot)46cd20(at)redterror(dot)net>
To: pgsql-sql(at)postgresql(dot)org
Subject: PG7.4.5: query not using index on date column
Date: 2004-11-23 14:28:16
Message-ID: 20041123092816.51d7aa5e.dave@redterror.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Hi Folks, I was hoping someone could help me to improve the performance of a query I've got that insists on doing a seq. scan on a large table. I'm trying to do some reporting based on my spam logs which I've partly digested and stored in a table. Here are the particulars:

The messages table:

Column | Type | Modifiers
-------------------+-----------------------+--------------
message_id | integer | not null default nextval('spamreport.messages_message_id_seq'::text)
received_date | date | not null
score | numeric | not null
user_threshold | numeric | not null
raw_email_address | character varying(64) | not null
processing_time | numeric | not null
size | integer | not null
fuzzed_address | character varying(64) | not null
domain | character varying(64) | not null
Indexes:
"messages_pkey" primary key, btree (message_id)
"domain_idx" btree ("domain")
"fuzzy_idx" btree (fuzzed_address)
"received_date_idx" btree (received_date)

And here's the primary query I run, along with explain analyze output:

>> explain analyze SELECT * FROM ( SELECT
domain,
count(*) as count,
max(score) as max_score,
avg(score) as average_score,
stddev(score) as stddev_score,
sum(CASE WHEN score > user_threshold THEN 1 ELSE 0 END) as spam_count,
avg(processing_time) as average_time,
avg(size) as average_size
FROM messages
WHERE received_date BETWEEN '2004-11-01' AND '2004-11-30'
GROUP BY domain ) AS aggs
ORDER BY count DESC;

QUERY PLAN
-------------------------------------------------------------------
Sort (cost=30303.51..30303.60 rows=35 width=226) (actual time=29869.716..29869.883 rows=69 loops=1)
Sort Key: count
-> Subquery Scan aggs (cost=30301.56..30302.61 rows=35 width=226) (actual time=29861.705..29869.240 rows=69 loops=1)
-> HashAggregate (cost=30301.56..30302.26 rows=35 width=54) (actual time=29861.681..29868.261 rows=69 loops=1)
-> Seq Scan on messages (cost=0.00..21573.04 rows=436426 width=54) (actual time=5.523..6304.657 rows=462931 loops=1)
Filter: ((received_date >= '2004-11-01'::date) AND (received_date <= '2004-11-30'::date))
Total runtime: 29870.437 ms

This database gets vacuumed nightly. Also, the query plan stays the same even if I restrict the received_date column down to a single day.

Thanks in advance,
--
Dave Steinberg
http://www.geekisp.com/

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Vishal Kashyap @ [SaiHertz] 2004-11-23 18:39:29 Re: Image Insert Doubt
Previous Message Premsun Choltanwanich 2004-11-23 09:45:21 Re: Image Insert Postgresql DB