ORDER BY Optimization

From: Derek Buttineau|Compu-SOLVE <derek(at)csolve(dot)net>
To: pgsql-sql(at)postgresql(dot)org
Subject: ORDER BY Optimization
Date: 2005-05-06 19:48:32
Message-ID: 427BCA10.4090208@csolve.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance pgsql-sql

Good Day,

I'm hoping this is the right place to send this. I have a query that's
causing me some grief trying to optimize it. The query cost is fine
until I order the data set. Mind you after it's been ran and cached,
subsequent calls to it are near instant. The Query in question is:

select mr.*,m.* from maillog_received mr JOIN maillog m ON mr.maillog_id
= m.id WHERE mr.subscription=89 and m.spam=1 ORDER BY m.msg_date desc
limit 10

The strucutre of the tables involved in the query are as follows:

Table "public.maillog_received"
Column | Type |
Modifiers
--------------+------------------------+------------------------------------------------------------------
id | integer | not null default
nextval('public.maillog_received_id_seq'::text)
maillog_id | bigint | not null
subscription | integer | not null
local_part | character varying(255) | not null
domain | character varying(255) | not null
Indexes:
"maillog_received_pkey" PRIMARY KEY, btree (id)
"maillog_received_subscription_idx" btree (subscription)
"maillog_received_subscription_maillog_id_idx" btree (subscription,
maillog_id)
Foreign-key constraints:
"$1" FOREIGN KEY (subscription) REFERENCES subscriptions(id) ON
DELETE CASCADE
"$2" FOREIGN KEY (maillog_id) REFERENCES maillog(id) ON DELETE CASCADE
Triggers:
checkit BEFORE INSERT OR UPDATE ON maillog_received FOR EACH ROW
EXECUTE PROCEDURE check_sub()

Table "public.maillog"
Column | Type |
Modifiers
-----------------+-----------------------------+---------------------------------------------------------
id | bigint | not null default
nextval('public.maillog_id_seq'::text)
message_id | character varying(16) |
msg_date | timestamp without time zone | not null
from_local_part | character varying(255) | not null
from_domain | character varying(255) | not null
from_ip | character varying(128) |
from_host | character varying(255) |
subject | character varying(255) |
virus | integer | not null default 0
spam | integer | not null default 0
list | integer | not null default 0
bad_recipient | integer | not null default 0
bad_sender | integer | not null default 0
bad_relay | integer | not null default 0
bad_file | integer | not null default 0
bad_mime | integer | not null default 0
sascore | numeric(7,2) |
sareport | text |
vscanreport | text |
contentreport | text |
bypassed | integer | not null default 0
delivered | integer | not null default 1
complete | integer | not null default 1
Indexes:
"maillog_pkey" PRIMARY KEY, btree (id)
"maillog_msg_date_idx" btree (msg_date)

EXPLAIN ANALYZE gives me the following:

Limit (cost=31402.85..31402.87 rows=10 width=306) (actual
time=87454.203..87454.334 rows=10 loops=1)
-> Sort (cost=31402.85..31405.06 rows=886 width=306) (actual
time=87454.187..87454.240 rows=10 loops=1)
Sort Key: m.msg_date
-> Nested Loop (cost=0.00..31359.47 rows=886 width=306)
(actual time=4.740..86430.468 rows=26308 loops=1)
-> Index Scan using maillog_received_subscription_idx on
maillog_received mr (cost=0.00..17789.73 rows=4479 width=43) (actual
time=0.030..33554.061 rows=65508 loops=1)
Index Cond: (subscription = 89)
-> Index Scan using maillog_pkey on maillog m
(cost=0.00..3.02 rows=1 width=263) (actual time=0.776..0.780 rows=0
loops=65508)
Index Cond: ("outer".maillog_id = m.id)
Filter: (spam = 1)
Total runtime: 87478.068 ms

Now there is a lot of data in these tables, at least a few million
records, but I'd hoped to get a bit better performance :) Now another
odd thing I will mention, if I take the database schema to a second
server (both running postgres 8.0.2 on FreeBSD 5.3), I get a much
different (and to me, it looks much more effecient) query plan (though
it's with substantially less data (about 500,000 records in the table)):

Limit (cost=0.00..6482.60 rows=10 width=311) (actual
time=25.340..26.885 rows=10 loops=1)
-> Nested Loop (cost=0.00..1175943.99 rows=1814 width=311) (actual
time=25.337..26.867 rows=10 loops=1)
-> Index Scan Backward using maillog_msg_date_idx on maillog
m (cost=0.00..869203.93 rows=51395 width=270) (actual
time=25.156..26.050 rows=48 loops=1)
Filter: (spam = 1)
-> Index Scan using
maillog_received_subscription_maillog_id_idx on maillog_received mr
(cost=0.00..5.96 rows=1 width=41) (actual time=0.011..0.012 rows=0 loops=48)
Index Cond: ((mr.subscription = 89) AND (mr.maillog_id =
"outer".id))
Total runtime: 27.016 ms

Any suggestions?

--
Regards,

Derek Buttineau
Internet Systems Developer
Compu-SOLVE Internet Services
Compu-SOLVE Technologies Inc.

705.725.1212 x255

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Rosser Schwarz 2005-05-06 20:35:30 Re: [SQL] ORDER BY Optimization
Previous Message Jeff - 2005-05-06 17:37:49 Re: SELinux & Redhat

Browse pgsql-sql by date

  From Date Subject
Next Message Rosser Schwarz 2005-05-06 20:35:30 Re: [SQL] ORDER BY Optimization
Previous Message Scott Marlowe 2005-05-06 17:59:27 Re: pg_dump without data