Re: Performance problem with joined aggregate query

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Anssi Kääriäinen <anssi(dot)kaariainen(at)thl(dot)fi>
Cc: pgsql-performance(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Performance problem with joined aggregate query
Date: 2010-09-15 22:25:15
Message-ID: AANLkTinQRvZ=m0uusv4qcvgVe6ToVWumMt8bGyBuq3i0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Sep 15, 2010 at 2:26 AM, Anssi Kääriäinen
<anssi(dot)kaariainen(at)thl(dot)fi> wrote:
> Hello all,
>
> I am trying to use aggregate queries in views, and when joining these views
> to other
> tables, I get seq scan in the view, even if index scan would be clearly
> better. The views
> I am using in my Db are actually long pivot queries, but the following
> simple test case is enough
> to show the problem.
>
> I will first show the table definitions, then the performance problem I am
> having.
>
> create table test1 (
>   id serial primary key not null,
>   other_id integer unique not null
> );
>
> create table test2 (
>    id integer not null references test1(id),
>    type integer,
>    value text
> );
>
> create index test2_idx on test2(id);
>
> insert into test1 select g, g+10000 from (select generate_series(1, 10000)
> as g) t;
> insert into test2 select g, g%3, 'testval'||g from (select
> generate_series(1, 10000) as g) t;
> insert into test2 select g, (g+1)%3, 'testval'||g from (select
> generate_series(1, 10000) as g) t;
> insert into test2 select g, (g+2)%3, 'testval'||g from (select
> generate_series(1, 10000) as g) t;
>
> Now, the following query is fast:
>
> select * from test1 inner join (select array_agg(value), id
> from test2 group by id) t on test1.id = t.id where test1.id = 1;
> (0.6ms)
>
> But the following query is slow (seqscan on test2):
>
> select * from test1 inner join (select array_agg(value), id
> from test2 group by id) t on test1.id = t.id where test1.other_id = 10001;
> (45ms)
>
> The same problem can be seen when running:
>
> select * from test1 inner join (select array_agg(value), id
> from test2 group by id) t on test1.id = t.id where test1.id in (1, 2);
> (40ms runtime)
>
> Fetching directly from test2 with id is fast:
>
> select array_agg(value), id
> from test2 where test2.id in (1, 2) group by id;
>
> If I set enable_seqscan to off, then I get fast results:
>
> select * from test1 inner join (select array_agg(value), id
> from test2 group by id) t on test1.id = t.id where test1.other_id in (10001,
> 10002);
> (0.6ms)
>
> Or slow results, if the fetched rows happen to be in the end of the index:
>
> select * from test1 inner join (select array_agg(value), id
> from test2 group by id) t on test1.id = t.id where test1.other_id = 20000;
> (40ms)
>
> Explain analyzes of the problematic query:
>
> With enable_seqscan:
>
> explain analyze select * from test1 inner join (select array_agg(value), id
> from test2 group by id) t on test1.id = t.id where test1.other_id = 10001;
>
> Hash Join  (cost=627.48..890.48 rows=50 width=44) (actual
> time=91.575..108.085 rows=1 loops=1)
>   Hash Cond: (test2.id = test1.id)
>   ->  HashAggregate  (cost=627.00..752.00 rows=10000 width=15) (actual
> time=82.663..98.281 rows=10000 loops=1)
>         ->  Seq Scan on test2  (cost=0.00..477.00 rows=30000 width=15)
> (actual time=0.009..30.650 rows=30000 loops=1)
>   ->  Hash  (cost=0.47..0.47 rows=1 width=8) (actual time=0.026..0.026
> rows=1 loops=1)
>         ->  Index Scan using test1_other_id_key on test1  (cost=0.00..0.47
> rows=1 width=8) (actual time=0.018..0.021 rows=1 loops=1)
>               Index Cond: (other_id = 10001)
> Total runtime: 109.686 ms
>
> Without enable_seqscan:
>
> explain analyze select * from test1 inner join (select array_agg(value), id
> from test2 group by id) t on test1.id = t.id where test1.other_id = 10001;
>
> Merge Join  (cost=0.48..895.91 rows=50 width=44) (actual time=0.066..0.085
> rows=1 loops=1)
>   Merge Cond: (test2.id = test1.id)
>   ->  GroupAggregate  (cost=0.00..769.56 rows=10000 width=15) (actual
> time=0.040..0.054 rows=2 loops=1)
>         ->  Index Scan using test2_idx on test2  (cost=0.00..494.56
> rows=30000 width=15) (actual time=0.017..0.030 rows=7 loops=1)
>   ->  Sort  (cost=0.48..0.48 rows=1 width=8) (actual time=0.020..0.022
> rows=1 loops=1)
>         Sort Key: test1.id
>         Sort Method:  quicksort  Memory: 17kB
>         ->  Index Scan using test1_other_id_key on test1  (cost=0.00..0.47
> rows=1 width=8) (actual time=0.010..0.012 rows=1 loops=1)
>               Index Cond: (other_id = 10001)

Take a look at this, and the responses. Is it the same case?:
http://www.mail-archive.com/pgsql-performance(at)postgresql(dot)org/msg21756.html

merlin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Smith 2010-09-16 00:32:32 Re: locking issue on simple selects?
Previous Message Tobias Brox 2010-09-15 21:33:00 Re: locking issue on simple selects?