Skip site navigation (1) Skip section navigation (2)

Re: query optimization question

From: Jack Coates <jack(at)lyris(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: query optimization question
Date: 2004-01-29 17:23:00
Message-ID: 1075396980.7494.107.camel@cletus.lyris.com (view raw or flat)
Thread:
Lists: pgsql-performance
On Wed, 2004-01-28 at 18:04, Tom Lane wrote:
> Jack Coates <jack(at)lyris(dot)com> writes:
> > I've got a query that needs some help, please. Is there a way to avoid
> > all the looping? I've got freedom to work with the double-indented
> > sections below ) AND (, but the initial select distinct wrapper is much
> > more difficult to change. This is auto-generated code.
> 
> Well, you're not going to get any serious improvement without a
> wholesale rewrite of the query --- I'd think that something driven by
> a GROUP BY memberid_ HAVING count(*) = whatever at the outer level would
> be a better way to approach it.  As you have it, the system has no
> choice but to fully evaluate two very expensive subselects, from scratch,
> for each outer row.
> 

I hear you. There's definitely an understanding that this tool can
generate some gnarly queries, and we want to redesign in a way that will
allow some more intelligence to be applied to the problem. In the
meantime, I'll be happy if PG grinds at the same level as other
databases. MS-SQL completed that query in 25 minutes on a database with
31 times the data in it. Since I'm one of the bigger *nix fans around
here, that doesn't make me happy.

> However...
> 
> > 		( select count(*) from lyrActiveRecips, members_ a, outmail_  
> > 		where lyrActiveRecips.UserName = a.UserNameLC_  
> > 		and lyrActiveRecips.Domain = a.Domain_  
> > 		and a.MemberID_ = members_.MemberID_  
> > 		and outmail_.MessageID_ = lyrActiveRecips.MailingID  
> 
> Is memberid_ a unique identifier for members_, as one would think from
> the name?  If so, can't you drop the join of members_ a in this
> subselect, and just use the corresponding fields from the outer table?
> 
> > 		( select count(*) from lyrCompletedRecips, members_ a, outmail_
> > 		where a.MemberID_ = lyrCompletedRecips.MemberID  
> > 		and a.UserNameLC_ = members_.UserNameLC_  
> > 		and a.Domain_ = members_.Domain_  
> > 		and outmail_.MessageID_ = lyrCompletedRecips.MailingID  
> 
> Why are the join conditions different here from the other subselect?
> Can't you rephrase them the same as above, and then again remove the
> inner appearance of members_ ?
> 
> 			regards, tom lane

unfortunately, the column names are different between lyrcompletedrecips
and lyractiverecips. However, one thing we were able to do is to reduce
the number of queries by not trying to match across multiple lists.

SELECT DISTINCT members_.emailaddr_, members_.memberid_ FROM members_ 
WHERE ( members_.List_='list1' 
	AND members_.MemberType_='normal' 
	AND members_.SubType_='mail' 
	AND members_.emailaddr_ IS NOT NULL ) 
AND ( 
	( select count(*) from lyrActiveRecips, outmail_ 
	where outmail_.MessageID_ = lyrActiveRecips.MailingID 
	and outmail_.Type_ = 'list' 
	and members_.MemberID_ = lyrActiveRecips.MemberID 
	and lyrActiveRecips.NextAttempt > '2004-01-20 00:00:00' )
	 + 
	( select count(*) from lyrCompletedRecips, outmail_ 
	where members_.MemberID_ = lyrCompletedRecips.MemberID 
	and outmail_.MessageID_ = lyrCompletedRecips.MailingID 
	and outmail_.Type_ = 'list' 
	and lyrCompletedRecips.FinalAttempt > '2004-01-20 00:00:00' 
	and lyrCompletedRecips.CompletionStatusID = 300 )
	 = 3
);

That completed in 3.5 minutes on MS-SQL. I killed the query this morning
after 15 hours on PostgreSQL 7.4. I tried a GROUP BY memberid_ HAVING
variation, which completed in 59 seconds on MS-SQL. I killed it after 35
minutes on PostgreSQL.

On a more positive note, if you remember the benchmarking I was doing
last month, PostgreSQL got some pretty good relative numbers. It
requires a lot of hand-holding and tuning relative to MS-SQL, but it
certainly beat the pants off of Oracle 8 and 9 for speed and ease of
management. Oracle 8 was in fact unable to complete the uglier stress
tests. I'll be working on a tuning recommendations white paper today.

thanks for all the help,
-- 
Jack Coates, Lyris Technologies Applications Engineer
510-549-4350 x148, jack(at)lyris(dot)com
"Interoperability is the keyword, uniformity is a dead end."
				--Olivier Fourdan



In response to

Responses

pgsql-performance by date

Next:From: Tom LaneDate: 2004-01-29 18:05:04
Subject: Re: query optimization question
Previous:From: Bruno Wolff IIIDate: 2004-01-29 15:43:19
Subject: Re: [SQL] limit 1 and functional indexes

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group