Quick Links

Re: Performance problem with joined aggregate query

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	Anssi Kääriäinen <anssi(dot)kaariainen(at)thl(dot)fi>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Performance problem with joined aggregate query
Date:	2010-09-16 14:52:49
Message-ID:	AANLkTi=8tf+tx1yPM4ehNNenmyg4jv8+e_BFnKiPwHdD@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Thu, Sep 16, 2010 at 1:51 AM, Anssi Kääriäinen
<anssi(dot)kaariainen(at)thl(dot)fi> wrote:
> Yes, looks like this is the same case. This makes it hard to use views
> having group by in them, as the whole group by part will always be
> executed. Back to planning board then...
>
> I guess my possibilities for pivot views are:
> - crosstab: Will make statistics go "bad", that is, the crosstab query
> will always seem to return static number of rows. This can cause
> problems in complex queries using the view. IIRC performance is
> a bit worse than pivot by group by.
> - left joins: if joining the same table 20 times, there will be some
> planner overhead. Maybe the best way for my usage case. Also about
> 2x slower than pivot using group by.
> - subselect each of the columns: way worse performance: for my use
> case, each added column adds about 50ms to run time, so for 20
> columns this will take 1 second. The group by pivot query runs in
> 250ms.
>
> Any other ideas?

yes. specifically, if you are targeting the aggregation towards an
array you have another option:

analyze select * from test1 inner join (select array_agg(value), id
from test2 group by id) t on test1.id = t.id where test1.other_id = 10001;

can be replaced w/

select *, array(select value from test2 where test2.id=test1.id) from
test1 where test1.other_id = 10001;

This (array vs array_agg) will give you faster performance than
aggregation so is better preferred unless you need to group for other
purposes than relating. I think the join if it could match up over
the group by key can give theoretically better plans but this should
be good enough especially if you aren't pulling a large amount of data
from the outer table.

merlin

In response to

Re: Performance problem with joined aggregate query at 2010-09-16 05:51:19 from Anssi Kääriäinen

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Josh Berkus	2010-09-16 18:39:22	Re: Where does data in pg_stat_user_tables come from?
Previous Message	Kevin Grittner	2010-09-16 14:25:32	Re: Is disableing nested_loops a bad idea ?