Re: Performance problem with joined aggregate query

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Anssi Kääriäinen <anssi(dot)kaariainen(at)thl(dot)fi>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Performance problem with joined aggregate query
Date: 2010-09-16 14:52:49
Message-ID: AANLkTi=8tf+tx1yPM4ehNNenmyg4jv8+e_BFnKiPwHdD@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Sep 16, 2010 at 1:51 AM, Anssi Kääriäinen
<anssi(dot)kaariainen(at)thl(dot)fi> wrote:
> Yes, looks like this is the same case. This makes it hard to use views
> having group by in them, as the whole group by part will always be
> executed. Back to planning board then...
>
> I guess my possibilities for pivot views are:
>  - crosstab: Will make statistics go "bad", that is, the crosstab query
>    will always seem to return static number of rows. This can cause
>    problems in complex queries using the view. IIRC performance is
>    a bit worse than pivot by group by.
>  - left joins: if joining the same table 20 times, there will be some
>    planner overhead. Maybe the best way for my usage case. Also about
>    2x slower than pivot using group by.
>  - subselect each of the columns: way worse performance: for my use
>    case, each added column adds about 50ms to run time, so for 20
>    columns this will take 1 second. The group by pivot query runs in
>    250ms.
>
> Any other ideas?

yes. specifically, if you are targeting the aggregation towards an
array you have another option:

analyze select * from test1 inner join (select array_agg(value), id
from test2 group by id) t on test1.id = t.id where test1.other_id = 10001;

can be replaced w/

select *, array(select value from test2 where test2.id=test1.id) from
test1 where test1.other_id = 10001;

This (array vs array_agg) will give you faster performance than
aggregation so is better preferred unless you need to group for other
purposes than relating. I think the join if it could match up over
the group by key can give theoretically better plans but this should
be good enough especially if you aren't pulling a large amount of data
from the outer table.

merlin

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2010-09-16 18:39:22 Re: Where does data in pg_stat_user_tables come from?
Previous Message Kevin Grittner 2010-09-16 14:25:32 Re: Is disableing nested_loops a bad idea ?