Skip site navigation (1) Skip section navigation (2)

Re: query optimization question

From: Jack Coates <jack(at)lyris(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: query optimization question
Date: 2004-01-29 21:23:28
Message-ID: 1075411407.7494.130.camel@cletus.lyris.com (view raw or flat)
Thread:
Lists: pgsql-performance
On Thu, 2004-01-29 at 11:31, Tom Lane wrote:
> Jack Coates <jack(at)lyris(dot)com> writes:
> > jackdb=# explain SELECT DISTINCT members_.memberid_ 
> > jackdb-# FROM members_ 
> > jackdb-# WHERE ( members_.List_='list1' 
> > jackdb(# AND members_.MemberType_='normal' 
> > jackdb(# AND members_.SubType_='mail' 
> > jackdb(# AND members_.emailaddr_ IS NOT NULL ) 
> > jackdb-# GROUP BY memberid_ HAVING ( 
> 
> Um, that's not what I had in mind at all.  Does GROUP BY actually do
> anything at all here?  (You didn't answer me as to whether memberid_
> is a unique identifier or not, but if it is, this GROUP BY is just an
> expensive no-op.)
> 

Sorry for the misunderstanding. It should be unique, yes.

> What I was envisioning was pulling the sub-selects up to the top level
> and using grouping to calculate the count(*) values for all memberids
> in parallel.  Roughly speaking it would look like (again assuming
> memberid_ is unique)
> 
> SELECT memberid_ FROM
> (
>   SELECT memberid_ FROM lyrActiveRecips, members_, outmail
>   WHERE (all the conditions for this case)
>   UNION ALL
>   SELECT memberid_ FROM lyrCompletedRecips, members_, outmail
>   WHERE (all the conditions for this case)
> )
> GROUP BY memberid_ HAVING count(*) = 3;
> 
> However, if you can't change the boilerplate part of your query then
> this is all blue-sky speculation anyway.  

Got it now -- I'm running into some subquery errors trying to implement
this, anyway.

> What I'm actually more
> interested in is your statement that MSSQL can do the original query
> quickly.  I find that a bit hard to believe because I don't see any
> relevant optimization techniques.  Do they have any equivalent to
> EXPLAIN that would give some hint how they're doing it?

yup -- here it is. It will probably be a nasty mess after linewrap gets
done with it, so let me know if you'd like me to post a copy on ftp.

SELECT DISTINCT members_.memberid_   FROM members_   WHERE (
members_.List_='list1'    AND members_.MemberType_='normal'    AND
members_.SubType_='mail' )   GROUP BY memberid_ HAVING (       ( select
count(*) from lyrActiveRecips, outmail_    where
outmail	11	1	0	NULL	NULL	1	NULL	102274.5	NULL	NULL	NULL	104.10356	NULL	NULL	SELECT	0	NULL
  |--Parallelism(Gather Streams)	11	2	1	Parallelism	Gather
Streams	NULL	NULL	102274.5	0.0	0.22011127	23	104.10356	[members_].[MemberID_]	NULL	PLAN_ROW	-1	1.0
       |--Filter(WHERE:(If ([Expr1006] IS NULL) then 0 else
[Expr1006]+If ([Expr1012] IS NULL) then 0 else
[Expr1012]=3))	11	3	2	Filter	Filter	WHERE:(If ([Expr1006] IS NULL) then
0 else [Expr1006]+If ([Expr1012] IS NULL) then 0 else
[Expr1012]=3)	NULL	102274.5	0.0	3.5393338	23	103.88345	[members_].[MemberID_]	NULL	PLAN_ROW	-1	1.0
            |--Hash Match(Right Outer Join,
HASH:([lyrCompletedRecips].[MemberID])=([members_].[MemberID_]),
RESIDUAL:([members_].[MemberID_]=[lyrCompletedRecips].[MemberID]))	11	4	3	Hash Match	Right Outer Join	HASH:([lyrCompletedRecips].[MemberID])=([members_].[MemberID_]), RESIDUAL:([members_].[MemberID_]=[lyrCompletedRecips].[MemberID])	NULL	4782883.5	0.0	21.874712	23	100.34412	[members_].[MemberID_], [Expr1006], [Expr1012]	NULL	PLAN_ROW	-1	1.0
                 |--Compute
Scalar(DEFINE:([Expr1012]=Convert([Expr1020])))	11	5	4	Compute
Scalar	Compute
Scalar	DEFINE:([Expr1012]=Convert([Expr1020]))	[Expr1012]=Convert([Expr1020])	119575.35	0.0	1.3723248	15	4.3749919	[lyrCompletedRecips].[MemberID], [Expr1012]	NULL	PLAN_ROW	-1	1.0
                 |    |--Hash Match(Aggregate,
HASH:([lyrCompletedRecips].[MemberID]),
RESIDUAL:([lyrCompletedRecips].[MemberID]=[lyrCompletedRecips].[MemberID]) DEFINE:([Expr1020]=COUNT(*)))	11	6	5	Hash Match	Aggregate	HASH:([lyrCompletedRecips].[MemberID]), RESIDUAL:([lyrCompletedRecips].[MemberID]=[lyrCompletedRecips].[MemberID])	[Expr1020]=COUNT(*)	119575.35	0.0	1.3723248	15	4.3749919	[lyrCompletedRecips].[MemberID], [Expr1020]	NULL	PLAN_ROW	-1	1.0
                 |         |--Parallelism(Repartition Streams, PARTITION
COLUMNS:([lyrCompletedRecips].[MemberID]))	11	7	6	Parallelism	Repartition Streams	PARTITION COLUMNS:([lyrCompletedRecips].[MemberID])	NULL	119640.6	0.0	0.32407209	173	3.002667	[lyrCompletedRecips].[MemberID]	NULL	PLAN_ROW	-1	1.0
                 |              |--Nested Loops(Inner Join, OUTER
REFERENCES:([outmail_].[MessageID_]))	11	8	7	Nested Loops	Inner
Join	OUTER
REFERENCES:([outmail_].[MessageID_])	NULL	119640.6	0.0	0.75014657	173	2.6785948	[lyrCompletedRecips].[MemberID]	NULL	PLAN_ROW	-1	1.0
                 |                   |--Parallelism(Distribute
Streams)	11	9	8	Parallelism	Distribute
Streams	NULL	NULL	1.0	0.0	2.8501874E-2	128	9.4664574E-2	[outmail_].[MessageID_]	NULL	PLAN_ROW	-1	1.0
                 |                   |    |--Clustered Index
Scan(OBJECT:([lmdb].[dbo].[outmail_].[IX_outmail_list]),
WHERE:([outmail_].[Type_]='list'))	11	10	9	Clustered Index
Scan	Clustered Index
Scan	OBJECT:([lmdb].[dbo].[outmail_].[IX_outmail_list]),
WHERE:([outmail_].[Type_]='list')	[outmail_].[Type_],
[outmail_].[MessageID_]	1.0	0.01878925	3.9800001E-5	128	3.7658099E-2	[outmail_].[Type_], [outmail_].[MessageID_]	NULL	PLAN_ROW	0	1.0
                 |                   |--Clustered Index
Seek(OBJECT:([lmdb].[dbo].[lyrCompletedRecips].[IX_CompletedRecipsMailingID]), SEEK:([lyrCompletedRecips].[MailingID]=[outmail_].[MessageID_]),  WHERE:([lyrCompletedRecips].[CompletionStatusID]=300 AN	11	11	8	Clustered Index Seek	Clustered Index Seek	OBJECT:([lmdb].[dbo].[lyrCompletedRecips].[IX_CompletedRecipsMailingID]), SEEK:([lyrCompletedRecips].[MailingID]=[outmail_].[MessageID_]),  WHERE:([lyrCompletedRecips].[CompletionStatusID]=300 AND [lyrCompletedRecips].[FinalAttempt]>'Jan 20 2004 12:00AM') 	[lyrCompletedRecips].[CompletionStatusID], [lyrCompletedRecips].[FinalAttempt], [lyrCompletedRecips].[MemberID]	119640.6	0.5750553	0.13207871	53	1.5463468	[lyrCompletedRecips].[CompletionStatusID], [lyrCompletedRecips].[FinalAttempt], [lyrCompletedRecips].[MemberID]	NULL	PLAN_ROW	-1	3.0
                 |--Parallelism(Repartition Streams, PARTITION
COLUMNS:([members_].[MemberID_]))	11	19	4	Parallelism	Repartition
Streams	PARTITION
COLUMNS:([members_].[MemberID_])	NULL	4782883.5	0.0	15.474822	19	74.094414	[members_].[MemberID_], [Expr1006]	NULL	PLAN_ROW	-1	1.0
                      |--Nested Loops(Left Outer Join,
WHERE:([members_].[MemberID_]=[lyrActiveRecips].[MemberID]))	11	20	19	Nested Loops	Left Outer Join	WHERE:([members_].[MemberID_]=[lyrActiveRecips].[MemberID])	NULL	4782883.5	0.0	9.9962263	19	58.619591	[members_].[MemberID_], [Expr1006]	NULL	PLAN_ROW	-1	1.0
                           |--Clustered Index
Seek(OBJECT:([lmdb].[dbo].[members_].[IX_members_List_EmailLC]),
SEEK:([members_].[List_]='list1'), 
WHERE:([members_].[MemberType_]='normal' AND
[members_].[SubType_]='mail') ORDERED FORWARD)	11	22	20	Clustered Index
Seek	Clustered Index
Seek	OBJECT:([lmdb].[dbo].[members_].[IX_members_List_EmailLC]),
SEEK:([members_].[List_]='list1'), 
WHERE:([members_].[MemberType_]='normal' AND
[members_].[SubType_]='mail') ORDERED FORWARD	[members_].[SubType_],
[members_].[MemberType_],
[members_].[MemberID_]	4782883.5	40.160122	3.2745986	410	43.434719	[members_].[SubType_], [members_].[MemberType_], [members_].[MemberID_]	NULL	PLAN_ROW	-1	1.0
                           |--Table Spool	11	24	20	Table Spool	Lazy
Spool	NULL	NULL	1.0	1.6756756E-2	3.7999999E-7	15	0.90211391	[lyrActiveRecips].[MemberID], [Expr1006]	NULL	PLAN_ROW	-1	4782883.5
                                |--Compute
Scalar(DEFINE:([Expr1006]=Convert([Expr1021])))	11	25	24	Compute
Scalar	Compute
Scalar	DEFINE:([Expr1006]=Convert([Expr1021]))	[Expr1006]=Convert([Expr1021])	1.0	0.0	7.6000001E-6	15	2.4437904E-2	[lyrActiveRecips].[MemberID], [Expr1006]	NULL	PLAN_ROW	-1	1.0
                                     |--Stream Aggregate(GROUP
BY:([lyrActiveRecips].[MemberID])
DEFINE:([Expr1021]=Count(*)))	11	26	25	Stream Aggregate	Aggregate	GROUP
BY:([lyrActiveRecips].[MemberID])	[Expr1021]=Count(*)	1.0	0.0	7.6000001E-6	15	2.4437904E-2	[lyrActiveRecips].[MemberID], [Expr1021]	NULL	PLAN_ROW	-1	1.0
                                          |--Sort(ORDER
BY:([lyrActiveRecips].[MemberID] ASC))	11	27	26	Sort	Sort	ORDER
BY:([lyrActiveRecips].[MemberID]
ASC)	NULL	1.0	1.1261261E-2	1.00011E-4	11	2.4430305E-2	[lyrActiveRecips].[MemberID]	NULL	PLAN_ROW	-1	1.0
                                              
|--Filter(WHERE:([outmail_].[Type_]='list'))	11	28	27	Filter	Filter	WHERE:([outmail_].[Type_]='list')	NULL	1.0	0.0	4.7999998E-7	156	1.3069032E-2	[lyrActiveRecips].[MemberID]	NULL	PLAN_ROW	-1	1.0
                                                    |--Bookmark
Lookup(BOOKMARK:([Bmk1004]),
OBJECT:([lmdb].[dbo].[outmail_]))	11	29	28	Bookmark Lookup	Bookmark
Lookup	BOOKMARK:([Bmk1004]),
OBJECT:([lmdb].[dbo].[outmail_])	[outmail_].[Type_]	1.0	3.1249749E-3	0.0000011	156	1.3068552E-2	[lyrActiveRecips].[MemberID], [outmail_].[Type_]	NULL	PLAN_ROW	-1	1.0
                                                         |--Nested
Loops(Inner Join, OUTER
REFERENCES:([lyrActiveRecips].[MailingID]))	11	30	29	Nested Loops	Inner
Join	OUTER
REFERENCES:([lyrActiveRecips].[MailingID])	NULL	1.0	0.0	0.00001254	138	9.9424766E-3	[lyrActiveRecips].[MemberID], [Bmk1004]	NULL	PLAN_ROW	-1	1.0
                                                             
|--Bookmark Lookup(BOOKMARK:([Bmk1002]),
OBJECT:([lmdb].[dbo].[lyrActiveRecips]))	11	31	30	Bookmark
Lookup	Bookmark Lookup	BOOKMARK:([Bmk1002]),
OBJECT:([lmdb].[dbo].[lyrActiveRecips])	[lyrActiveRecips].[MemberID],
[lyrActiveRecips].[MailingID]	1.0	3.1249749E-3	0.0000011	53	6.4091529E-3	[lyrActiveRecips].[MemberID], [lyrActiveRecips].[MailingID]	NULL	PLAN_ROW	-1	1.0
                                                              |   
|--Index
Seek(OBJECT:([lmdb].[dbo].[lyrActiveRecips].[jacktest_lar_date_ix]),
SEEK:([lyrActiveRecips].[NextAttempt] > 'Jan 20 2004 12:00AM') ORDERED
FORWARD)	11	32	31	Index Seek	Index
Seek	OBJECT:([lmdb].[dbo].[lyrActiveRecips].[jacktest_lar_date_ix]),
SEEK:([lyrActiveRecips].[NextAttempt] > 'Jan 20 2004 12:00AM') ORDERED
FORWARD	[Bmk1002]	1.0	3.2034749E-3	7.9603E-5	40	3.2830781E-3	[Bmk1002]	NULL	PLAN_ROW	-1	1.0
                                                              |--Index
Seek(OBJECT:([lmdb].[dbo].[outmail_].[PK_outmail_]),
SEEK:([outmail_].[MessageID_]=[lyrActiveRecips].[MailingID]) ORDERED
FORWARD)	11	33	30	Index Seek	Index
Seek	OBJECT:([lmdb].[dbo].[outmail_].[PK_outmail_]),
SEEK:([outmail_].[MessageID_]=[lyrActiveRecips].[MailingID]) ORDERED
FORWARD	[Bmk1004]	1.0	3.2034749E-3	7.9603E-5	93	3.520784E-3	[Bmk1004]	NULL	PLAN_ROW	-1	3.0
-- 
Jack Coates, Lyris Technologies Applications Engineer
510-549-4350 x148, jack(at)lyris(dot)com
"Interoperability is the keyword, uniformity is a dead end."
				--Olivier Fourdan



In response to

Responses

pgsql-performance by date

Next:From: Richard HuxtonDate: 2004-01-29 21:37:16
Subject: Re: Explain plan for 2 column index
Previous:From: Bill MoranDate: 2004-01-29 19:44:28
Subject: Re: [PERFORM] Set-Returning Functions WAS: On the performance of

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group