Re: Speed difference between select ... union select ... and select from partitioned_table

From: Pablo Alcaraz <pabloa(at)laotraesquina(dot)com(dot)ar>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Speed difference between select ... union select ... and select from partitioned_table
Date: 2007-10-27 05:14:21
Message-ID: 4722C92D.8050501@laotraesquina.com.ar
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

These are the EXPLAIN ANALIZE:

I ran both queries on a CLUSTER and ANALYZEd tables:

UNION QUERY
explain analyze
select e, p, sum( c) as c
from (
select e, p, count( *) as c
from tt_00003
group by e, p
union
select e, p, count( *) as c
from tt_00006
group by e, p
union
select e, p, count( *) as c
from tt_00009
group by e, p
union
select e, p, count( *) as c
from tt_00012
group by e, p
union
select e, p, count( *) as c
from tt_00015
group by e, p
) as t
group by e, p
order by e, p desc;

"Sort (cost=2549202.87..2549203.37 rows=200 width=16) (actual
time=263593.182..263593.429 rows=207 loops=1)"
" Sort Key: t.e, t.p"
" -> HashAggregate (cost=2549192.73..2549195.23 rows=200 width=16)
(actual time=263592.469..263592.763 rows=207 loops=1)"
" -> Unique (cost=2549172.54..2549179.88 rows=734 width=8)
(actual time=263590.481..263591.764 rows=356 loops=1)"
" -> Sort (cost=2549172.54..2549174.38 rows=734 width=8)
(actual time=263590.479..263590.891 rows=356 loops=1)"
" Sort Key: e, p, c"
" -> Append (cost=1307131.88..2549137.60 rows=734
width=8) (actual time=132862.176..263589.774 rows=356 loops=1)"
" -> HashAggregate
(cost=1307131.88..1307133.03 rows=92 width=8) (actual
time=132862.173..132862.483 rows=200 loops=1)"
" -> Seq Scan on tt_00003
(cost=0.00..1081550.36 rows=30077536 width=8) (actual
time=10.135..83957.424 rows=30000000 loops=1)"
" -> HashAggregate
(cost=1241915.64..1241916.16 rows=42 width=8) (actual
time=130726.219..130726.457 rows=156 loops=1)"
" -> Seq Scan on tt_00006
(cost=0.00..1028793.22 rows=28416322 width=8) (actual
time=11.389..87338.730 rows=28351293 loops=1)"
" -> HashAggregate (cost=24.53..27.03
rows=200 width=8) (actual time=0.005..0.005 rows=0 loops=1)"
" -> Seq Scan on tt_00009
(cost=0.00..18.30 rows=830 width=8) (actual time=0.002..0.002 rows=0
loops=1)"
" -> HashAggregate (cost=24.53..27.03
rows=200 width=8) (actual time=0.004..0.004 rows=0 loops=1)"
" -> Seq Scan on tt_00012
(cost=0.00..18.30 rows=830 width=8) (actual time=0.002..0.002 rows=0
loops=1)"
" -> HashAggregate (cost=24.53..27.03
rows=200 width=8) (actual time=0.005..0.005 rows=0 loops=1)"
" -> Seq Scan on tt_00015
(cost=0.00..18.30 rows=830 width=8) (actual time=0.001..0.001 rows=0
loops=1)"
"Total runtime: 263594.381 ms"

PARTITIONED QUERY

explain analyze
select e, p, count( *) as c
from tt
group by e, p
order by e, p desc;

"GroupAggregate (cost=13256958.67..13842471.95 rows=40000 width=8)
(actual time=899391.384..1065585.531 rows=207 loops=1)"
" -> Sort (cost=13256958.67..13403211.99 rows=58501328 width=8)
(actual time=899391.364..989749.914 rows=58351293 loops=1)"
" Sort Key: public.tt.e, public.tt.p"
" -> Append (cost=0.00..2110508.28 rows=58501328 width=8)
(actual time=14.031..485211.466 rows=58351293 loops=1)"
" -> Seq Scan on tt (cost=0.00..18.30 rows=830 width=8)
(actual time=0.002..0.002 rows=0 loops=1)"
" -> Seq Scan on tt_00003 tt (cost=0.00..1081550.36
rows=30077536 width=8) (actual time=14.024..178657.738 rows=30000000
loops=1)"
" -> Seq Scan on tt_00006 tt (cost=0.00..1028793.22
rows=28416322 width=8) (actual time=39.852..168307.030 rows=28351293
loops=1)"
" -> Seq Scan on tt_00009 tt (cost=0.00..18.30 rows=830
width=8) (actual time=0.001..0.001 rows=0 loops=1)"
" -> Seq Scan on tt_00012 tt (cost=0.00..18.30 rows=830
width=8) (actual time=0.001..0.001 rows=0 loops=1)"
" -> Seq Scan on tt_00015 tt (cost=0.00..18.30 rows=830
width=8) (actual time=0.001..0.001 rows=0 loops=1)"
" -> Seq Scan on tt_00018 tt (cost=0.00..18.30 rows=830
width=8) (actual time=0.001..0.001 rows=0 loops=1)"
" -> Seq Scan on tt_00021 tt (cost=0.00..18.30 rows=830
width=8) (actual time=0.002..0.002 rows=0 loops=1)"
" -> Seq Scan on tt_00024 tt (cost=0.00..18.30 rows=830
width=8) (actual time=0.002..0.002 rows=0 loops=1)"
" -> Seq Scan on tt_00027 tt (cost=0.00..18.30 rows=830
width=8) (actual time=0.002..0.002 rows=0 loops=1)"
" -> Seq Scan on tt_00030 tt (cost=0.00..18.30 rows=830
width=8) (actual time=0.002..0.002 rows=0 loops=1)"
"Total runtime: 1066301.084 ms"

Any idea?

Regards

Pablo

Jeff Davis wrote:
> On Fri, 2007-10-26 at 16:37 -0400, Pablo Alcaraz wrote:
>
>> Hi List!
>>
>> I executed 2 equivalents queries. The first one uses a union structure.
>> The second uses a partitioned table. The tables are the same with 30
>> millions of rows each one and the returned rows are the same.
>>
>> But the union query perform faster than the partitioned query.
>>
>>
>
> I think you mean to use UNION ALL here. UNION forces a DISTINCT, which
> results in a sort operation. What surprises me is that the UNION is
> actually faster than the partitioning using inheritance.
>
> I suspect it has something to do with the GROUP BYs, but we won't know
> until you post EXPLAIN ANALYZE results.
>
> Regards,
> Jeff Davis
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2007-10-27 05:37:29 Re: partitioned table and ORDER BY indexed_field DESC LIMIT 1
Previous Message Anton 2007-10-27 04:26:21 Re: partitioned table and ORDER BY indexed_field DESC LIMIT 1