Quick Links

Re: optimizing query with multiple aggregates

From:	Nikolas Everett <nik9000(at)gmail(dot)com>
To:	Doug Cole <dougcole(at)gmail(dot)com>
Cc:	pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: optimizing query with multiple aggregates
Date:	2009-10-22 02:47:45
Message-ID:	d4e11e980910211947r43f3be24r83186126baef288f@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

So you've got a query like:
SELECT SUM(CASE WHEN field >= 0 AND field < 10 THEN 1 ELSE 0 END) as
zeroToTen,
SUM(CASE WHEN field >= 10 AND field < 20 THEN 1 ELSE 0 END) as
tenToTwenty,
SUM(CASE WHEN field >= 20 AND field < 30 THEN 1 ELSE 0 END) as
tenToTwenty,
...
FROM bigtable

My guess is this forcing a whole bunch of if checks and your getting cpu
bound. Could you try something like:

SELECT SUM(CASE WHEN field >= 0 AND field < 10 THEN count ELSE 0 END) as
zeroToTen,
SUM(CASE WHEN field >= 10 AND field < 20 THEN count ELSE 0
END) as tenToTwenty,
SUM(CASE WHEN field >= 20 AND field < 30 THEN count ELSE 0
END) as tenToTwenty,
...
FROM (SELECT field, count(*) FROM bigtable GROUP BY field)

which will allow a hash aggregate? You'd do a hash aggregate on the whole
table which should be quick and then you'd summarize your bins.

This all supposes that you don't want to just query postgres's column
statistics.

On Wed, Oct 21, 2009 at 10:21 PM, Doug Cole <dougcole(at)gmail(dot)com> wrote:

> On Wed, Oct 21, 2009 at 5:39 PM, Merlin Moncure <mmoncure(at)gmail(dot)com>
> wrote:
> >
> > On Wed, Oct 21, 2009 at 6:51 PM, Doug Cole <dougcole(at)gmail(dot)com> wrote:
> > > I have a reporting query that is taking nearly all of it's time in
> aggregate
> > > functions and I'm trying to figure out how to optimize it. The query
> takes
> > > approximately 170ms when run with "select *", but when run with all the
> > > aggregate functions the query takes 18 seconds. The slowness comes
> from our
> > > attempt to find distribution data using selects of the form:
> > >
> > > SUM(CASE WHEN field >= x AND field < y THEN 1 ELSE 0 END)
> > >
> > > repeated across many different x,y values and fields to build out
> several
> > > histograms of the data. The main culprit appears to be the CASE
> statement,
> > > but I'm not sure what to use instead. I'm sure other people have had
> > > similar queries and I was wondering what methods they used to build out
> data
> > > like this?
> >
> > have you tried:
> >
> > count(*) where field >= x AND field < y;
> >
> > ??
> >
> > merlin
>
> Unless I'm misunderstanding you, that would require breaking each bin
> into a separate sql statement and since I'm trying to calculate more
> than 100 bins between the different fields any improvement in the
> aggregate functions would be overwhelmed by the cost of the actual
> query, which is about 170ms.
> Thanks,
> Doug
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Re: optimizing query with multiple aggregates at 2009-10-22 02:21:36 from Doug Cole

Responses

Re: optimizing query with multiple aggregates at 2009-10-26 09:39:48 from Marc Mamin

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Robert Haas	2009-10-22 03:16:28	Re: Random penalties on GIN index updates?
Previous Message	David Wilson	2009-10-22 02:47:31	Re: optimizing query with multiple aggregates