Skip site navigation (1) Skip section navigation (2)

Re: optimizing query with multiple aggregates

From: "Marc Mamin" <M(dot)Mamin(at)intershop(dot)de>
To: "Nikolas Everett" <nik9000(at)gmail(dot)com>, "Doug Cole" <dougcole(at)gmail(dot)com>
Cc: "pgsql-performance" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: optimizing query with multiple aggregates
Date: 2009-10-26 09:39:48
Message-ID: C4DAC901169B624F933534A26ED7DF31010A534F@JENMAIL01.ad.intershop.net (view raw or flat)
Thread:
Lists: pgsql-performance
Hello,
 
I didn't try it, but following should be slightly faster:
 
COUNT( CASE WHEN field >= x AND field < y THEN true END)
intead of 
SUM( CASE WHEN field >= x AND field < y THEN 1 ELSE 0 END)
 
HTH,
 
Marc Mamin


________________________________

From: pgsql-performance-owner(at)postgresql(dot)org
[mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of Nikolas
Everett
Sent: Thursday, October 22, 2009 4:48 AM
To: Doug Cole
Cc: pgsql-performance
Subject: Re: [PERFORM] optimizing query with multiple aggregates


So you've got a query like: 

SELECT SUM(CASE WHEN field >= 0 AND field < 10 THEN 1 ELSE 0 END) as
zeroToTen,
              SUM(CASE WHEN field >= 10 AND field < 20 THEN 1 ELSE 0
END) as tenToTwenty,
              SUM(CASE WHEN field >= 20 AND field < 30 THEN 1 ELSE 0
END) as tenToTwenty,
...
FROM  bigtable




My guess is this forcing a whole bunch of if checks and your getting cpu
bound.  Could you try something like:


SELECT SUM(CASE WHEN field >= 0 AND field < 10 THEN count ELSE 0 END) as
zeroToTen,
              SUM(CASE WHEN field >= 10 AND field < 20 THEN count ELSE 0
END) as tenToTwenty,
              SUM(CASE WHEN field >= 20 AND field < 30 THEN count ELSE 0
END) as tenToTwenty,
...
FROM  (SELECT field, count(*) FROM bigtable GROUP BY field)


which will allow a hash aggregate?  You'd do a hash aggregate on the
whole table which should be quick and then you'd summarize your bins.


This all supposes that you don't want to just query postgres's column
statistics.


On Wed, Oct 21, 2009 at 10:21 PM, Doug Cole <dougcole(at)gmail(dot)com> wrote:


	On Wed, Oct 21, 2009 at 5:39 PM, Merlin Moncure
<mmoncure(at)gmail(dot)com> wrote:
	>
	> On Wed, Oct 21, 2009 at 6:51 PM, Doug Cole
<dougcole(at)gmail(dot)com> wrote:
	> > I have a reporting query that is taking nearly all of it's
time in aggregate
	> > functions and I'm trying to figure out how to optimize it.
The query takes
	> > approximately 170ms when run with "select *", but when run
with all the
	> > aggregate functions the query takes 18 seconds.  The
slowness comes from our
	> > attempt to find distribution data using selects of the form:
	> >
	> > SUM(CASE WHEN field >= x AND field < y THEN 1 ELSE 0 END)
	> >
	> > repeated across many different x,y values and fields to
build out several
	> > histograms of the data.  The main culprit appears to be the
CASE statement,
	> > but I'm not sure what to use instead.  I'm sure other people
have had
	> > similar queries and I was wondering what methods they used
to build out data
	> > like this?
	>
	> have you tried:
	>
	> count(*) where field >= x AND field < y;
	>
	> ??
	>
	> merlin
	
	
	Unless I'm misunderstanding you, that would require breaking
each bin
	into a separate sql statement and since I'm trying to calculate
more
	than 100 bins between the different fields any improvement in
the
	aggregate functions would be overwhelmed by the cost of the
actual
	query, which is about 170ms.
	Thanks,
	Doug
	

	--
	Sent via pgsql-performance mailing list
(pgsql-performance(at)postgresql(dot)org)
	To make changes to your subscription:
	http://www.postgresql.org/mailpref/pgsql-performance
	


In response to

pgsql-performance by date

Next:From: Michal J. KubskiDate: 2009-10-26 10:05:23
Subject: Re: query planning different in plpgsql?
Previous:From: decibelDate: 2009-10-25 20:43:56
Subject: Re: Domain vs table

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group