query optimization question

From: Jack Coates <jack(at)lyris(dot)com>
To: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: query optimization question
Date: 2004-01-29 01:05:54
Message-ID: 1075338353.7494.71.camel@cletus.lyris.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi all,

I've got a query that needs some help, please. Is there a way to avoid
all the looping? I've got freedom to work with the double-indented
sections below ) AND (, but the initial select distinct wrapper is much
more difficult to change. This is auto-generated code.

explain analyze SELECT DISTINCT members_.emailaddr_, members_.memberid_
FROM members_ WHERE (
members_.List_='list1'
AND members_.MemberType_='normal'
AND members_.SubType_='mail'
AND members_.emailaddr_ IS NOT NULL
) AND (
( select count(*) from lyrActiveRecips, members_ a, outmail_
where lyrActiveRecips.UserName = a.UserNameLC_
and lyrActiveRecips.Domain = a.Domain_
and a.MemberID_ = members_.MemberID_
and outmail_.MessageID_ = lyrActiveRecips.MailingID
and outmail_.Type_ = 'list'
and lyrActiveRecips.NextAttempt > '2004-01-20 00:00:00'
)
+
( select count(*) from lyrCompletedRecips, members_ a, outmail_
where a.MemberID_ = lyrCompletedRecips.MemberID
and a.UserNameLC_ = members_.UserNameLC_
and a.Domain_ = members_.Domain_
and outmail_.MessageID_ = lyrCompletedRecips.MailingID
and outmail_.Type_ = 'list'
and lyrCompletedRecips.FinalAttempt > '2004-01-20 00:00:00'
and lyrCompletedRecips.CompletionStatusID = 300 )
= 3
)
;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=537.06..537.07 rows=1 width=72) (actual
time=114460.908..114460.908 rows=0 loops=1)
-> Sort (cost=537.06..537.06 rows=1 width=72) (actual
time=114460.905..114460.905 rows=0 loops=1)
Sort Key: emailaddr_, memberid_
-> Index Scan using ix_members_list_notifyerr on members_
(cost=0.00..537.05 rows=1 width=72) (actual time=114460.893..114460.893
rows=0 loops=1)
Index Cond: ((list_)::text = 'list1'::text)
Filter: (((membertype_)::text = 'normal'::text) AND
((subtype_)::text = 'mail'::text) AND (emailaddr_ IS NOT NULL) AND
(((subplan) + (subplan)) = 3))
SubPlan
-> Aggregate (cost=52.39..52.39 rows=1 width=0)
(actual time=0.089..0.090 rows=1 loops=818122)
-> Hash Join (cost=47.55..52.39 rows=1 width=0)
(actual time=0.086..0.086 rows=0 loops=818122)
Hash Cond: ("outer".memberid_ =
"inner".memberid)
-> Index Scan using ix_members_emaillc on
members_ a (cost=0.00..4.83 rows=1 width=4) (actual time=0.077..0.081
rows=1 loops=818122)
Index Cond: (((domain_)::text =
($2)::text) AND ((usernamelc_)::text = ($1)::text))
-> Hash (cost=47.55..47.55 rows=1
width=4) (actual time=0.025..0.025 rows=0 loops=1)
-> Hash Join (cost=25.00..47.55
rows=1 width=4) (actual time=0.023..0.023 rows=0 loops=1)
Hash Cond: ("outer".messageid_
= "inner".mailingid)
-> Seq Scan on outmail_
(cost=0.00..22.50 rows=6 width=4) (actual time=0.001..0.001 rows=0
loops=1)
Filter: ((type_)::text =
'list'::text)
-> Hash (cost=25.00..25.00
rows=2 width=8) (actual time=0.003..0.003 rows=0 loops=1)
-> Seq Scan on
lyrcompletedrecips (cost=0.00..25.00 rows=2 width=8) (actual
time=0.001..0.001 rows=0 loops=1)
Filter:
((finalattempt > '2004-01-20 00:00:00'::timestamp without time zone) AND
(completionstatusid = 300))
-> Aggregate (cost=51.59..51.59 rows=1 width=0)
(actual time=0.033..0.034 rows=1 loops=818122)
-> Hash Join (cost=27.35..51.59 rows=1 width=0)
(actual time=0.028..0.028 rows=0 loops=818122)
Hash Cond: ((("outer".username)::text =
("inner".usernamelc_)::text) AND (("outer"."domain")::text =
("inner".domain_)::text))
-> Hash Join (cost=22.52..46.72 rows=3
width=211) (actual time=0.003..0.003 rows=0 loops=818122)
Hash Cond: ("outer".mailingid =
"inner".messageid_)
-> Seq Scan on lyractiverecips
(cost=0.00..22.50 rows=334 width=215) (actual time=0.001..0.001 rows=0
loops=818122)
Filter: (nextattempt >
'2004-01-20 00:00:00'::timestamp without time zone)
-> Hash (cost=22.50..22.50 rows=6
width=4) (actual time=0.003..0.003 rows=0 loops=1)
-> Seq Scan on outmail_
(cost=0.00..22.50 rows=6 width=4) (actual time=0.002..0.002 rows=0
loops=1)
Filter: ((type_)::text =
'list'::text)
-> Hash (cost=4.82..4.82 rows=2
width=211) (actual time=0.017..0.017 rows=0 loops=818122)
-> Index Scan using pk_members_ on
members_ a (cost=0.00..4.82 rows=2 width=211) (actual time=0.011..0.013
rows=1 loops=818122)
Index Cond: (memberid_ = $0)
Total runtime: 114474.407 ms
(34 rows)

that's with no data in lyractiverecips or lyrcompletedrecips. With data
in those tables, the query still hasn't completed after several hours on
two different machines.

thanks,
--
Jack Coates, Lyris Technologies Applications Engineer
510-549-4350 x148, jack(at)lyris(dot)com
"Interoperability is the keyword, uniformity is a dead end."
--Olivier Fourdan

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2004-01-29 02:04:49 Re: query optimization question
Previous Message Nick Barr 2004-01-28 22:08:40 Re: Linux / Clariion