Re: slow group by query

From: Ellen Cyran <ellen(at)urban(dot)csuohio(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: slow group by query
Date: 2002-11-19 17:47:59
Message-ID: 5.1.1.6.0.20021119124548.01afcb88@wolf.csuohio.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql


That's a whole lot faster. The query on 40 msa_codes that took
7 minutes, now only takes 10 seconds.
Thanks a lot.

At 11:54 AM 11/19/2002 -0500, Tom Lane wrote:
>Ellen Cyran <ellen(at)urban(dot)csuohio(dot)edu> writes:
> > Here is the explain analyze:
>
> > Group (cost=637.18..696.52 rows=593 width=22) (actual
> time=982.67..67581.85 rows=435 loops=1)
> > -> Sort (cost=637.18..637.18 rows=5934 width=22) (actual
> time=833.27..844.78 rows=6571 loops=1)
>
>Well, we don't have to read any further than that to see that all the
>time is being spent in the final Group step --- and since grouping 6500
>rows is surely not taking very long, the true cost must be in the
>evaluation of the SELECT's output targetlist (which the estimated costs
>ignore, since the planner can't do much to change it). In other words,
>what's hurting you are those subselects in the SELECT list. You're
>doing two thousand separate subselects (435*5) --- evidently they take
>about 30 msec apiece, which isn't that bad by itself, but it adds up.
>
>What you've basically got here is what Joe Celko calls a "crosstab
>by subqueries" (_SQL For Smarties_, section 23.6.3). You might want
>to buy his book, which shows several other ways to do crosstab queries,
>but the one that seems most directly useful is to GROUP BY and use
>aggregate functions to extract the values you want in each crosstab
>column:
>
>SELECT msa_code, sic, own, ind_div,
>max(case when year = '1975' then emp else null end) as emp1975,
>max(case when year = '1976' then emp else null end) as emp1976,
>max(case when year = '1977' then emp else null end) as emp1977,
>max(case when year = '1978' then emp else null end) as emp1978,
>max(case when year = '1990' then emp else null end) as emp1990
>FROM tbl_bls_msa
>WHERE msa_code in ('1680','1640','0040','0120','0080')
>GROUP BY msa_code, sic, ind_div, own ;
>
>If I understand your data schema correctly, only one row in a
>msa_code, sic, ind_div, own group will have a particular year
>value, so the case/max structure extracts that value, or gives
>NULL if there's no such row. (MIN would have worked as well;
>with a numeric field you could use SUM too.)
>
> regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 4: Don't 'kill -9' the postmaster

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Chris Gamache 2002-11-19 18:29:40 Using VIEW to simplify code...
Previous Message Joe Conway 2002-11-19 17:27:26 Re: Proposal of hierarchical queries, a la Oracle