Quick Links

Re: Performance problem with joined aggregate query

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	Anssi Kääriäinen <anssi(dot)kaariainen(at)thl(dot)fi>
Cc:	pgsql-performance(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Performance problem with joined aggregate query
Date:	2010-09-15 22:25:15
Message-ID:	AANLkTinQRvZ=m0uusv4qcvgVe6ToVWumMt8bGyBuq3i0@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Wed, Sep 15, 2010 at 2:26 AM, Anssi Kääriäinen
<anssi(dot)kaariainen(at)thl(dot)fi> wrote:
> Hello all,
>
> I am trying to use aggregate queries in views, and when joining these views
> to other
> tables, I get seq scan in the view, even if index scan would be clearly
> better. The views
> I am using in my Db are actually long pivot queries, but the following
> simple test case is enough
> to show the problem.
>
> I will first show the table definitions, then the performance problem I am
> having.
>
> create table test1 (
> id serial primary key not null,
> other_id integer unique not null
> );
>
> create table test2 (
> id integer not null references test1(id),
> type integer,
> value text
> );
>
> create index test2_idx on test2(id);
>
> insert into test1 select g, g+10000 from (select generate_series(1, 10000)
> as g) t;
> insert into test2 select g, g%3, 'testval'||g from (select
> generate_series(1, 10000) as g) t;
> insert into test2 select g, (g+1)%3, 'testval'||g from (select
> generate_series(1, 10000) as g) t;
> insert into test2 select g, (g+2)%3, 'testval'||g from (select
> generate_series(1, 10000) as g) t;
>
> Now, the following query is fast:
>
> select * from test1 inner join (select array_agg(value), id
> from test2 group by id) t on test1.id = t.id where test1.id = 1;
> (0.6ms)
>
> But the following query is slow (seqscan on test2):
>
> select * from test1 inner join (select array_agg(value), id
> from test2 group by id) t on test1.id = t.id where test1.other_id = 10001;
> (45ms)
>
> The same problem can be seen when running:
>
> select * from test1 inner join (select array_agg(value), id
> from test2 group by id) t on test1.id = t.id where test1.id in (1, 2);
> (40ms runtime)
>
> Fetching directly from test2 with id is fast:
>
> select array_agg(value), id
> from test2 where test2.id in (1, 2) group by id;
>
> If I set enable_seqscan to off, then I get fast results:
>
> select * from test1 inner join (select array_agg(value), id
> from test2 group by id) t on test1.id = t.id where test1.other_id in (10001,
> 10002);
> (0.6ms)
>
> Or slow results, if the fetched rows happen to be in the end of the index:
>
> select * from test1 inner join (select array_agg(value), id
> from test2 group by id) t on test1.id = t.id where test1.other_id = 20000;
> (40ms)
>
> Explain analyzes of the problematic query:
>
> With enable_seqscan:
>
> explain analyze select * from test1 inner join (select array_agg(value), id
> from test2 group by id) t on test1.id = t.id where test1.other_id = 10001;
>
> Hash Join (cost=627.48..890.48 rows=50 width=44) (actual
> time=91.575..108.085 rows=1 loops=1)
> Hash Cond: (test2.id = test1.id)
> -> HashAggregate (cost=627.00..752.00 rows=10000 width=15) (actual
> time=82.663..98.281 rows=10000 loops=1)
> -> Seq Scan on test2 (cost=0.00..477.00 rows=30000 width=15)
> (actual time=0.009..30.650 rows=30000 loops=1)
> -> Hash (cost=0.47..0.47 rows=1 width=8) (actual time=0.026..0.026
> rows=1 loops=1)
> -> Index Scan using test1_other_id_key on test1 (cost=0.00..0.47
> rows=1 width=8) (actual time=0.018..0.021 rows=1 loops=1)
> Index Cond: (other_id = 10001)
> Total runtime: 109.686 ms
>
> Without enable_seqscan:
>
> explain analyze select * from test1 inner join (select array_agg(value), id
> from test2 group by id) t on test1.id = t.id where test1.other_id = 10001;
>
> Merge Join (cost=0.48..895.91 rows=50 width=44) (actual time=0.066..0.085
> rows=1 loops=1)
> Merge Cond: (test2.id = test1.id)
> -> GroupAggregate (cost=0.00..769.56 rows=10000 width=15) (actual
> time=0.040..0.054 rows=2 loops=1)
> -> Index Scan using test2_idx on test2 (cost=0.00..494.56
> rows=30000 width=15) (actual time=0.017..0.030 rows=7 loops=1)
> -> Sort (cost=0.48..0.48 rows=1 width=8) (actual time=0.020..0.022
> rows=1 loops=1)
> Sort Key: test1.id
> Sort Method: quicksort Memory: 17kB
> -> Index Scan using test1_other_id_key on test1 (cost=0.00..0.47
> rows=1 width=8) (actual time=0.010..0.012 rows=1 loops=1)
> Index Cond: (other_id = 10001)

Take a look at this, and the responses. Is it the same case?:
http://www.mail-archive.com/pgsql-performance(at)postgresql(dot)org/msg21756.html

merlin

In response to

Performance problem with joined aggregate query at 2010-09-15 06:26:14 from Anssi Kääriäinen

Responses

Re: Performance problem with joined aggregate query at 2010-09-16 05:51:19 from Anssi Kääriäinen

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Greg Smith	2010-09-16 00:32:32	Re: locking issue on simple selects?
Previous Message	Tobias Brox	2010-09-15 21:33:00	Re: locking issue on simple selects?