slow count in window query

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: slow count in window query
Date: 2009-07-15 10:18:54
Message-ID: 162867790907150318y3dccb77ep1ba6eb94742e72c8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

I did some test - median via window function - I found probably some
bad optimised code. I found two methods - Celko and Itzik Ben-Gan.
Ben-Gan methoud should to be faster - there is one sort less, but in
practice - it is 2 times slower.

create table x(a integer);
insert into x select (random()*10000)::int from generate_series(1,10000);

Celko method:
postgres=# explain select avg(a)
from (select a, row_number() over
(order by a asc) as hi,
row_number()
over (order by a desc) as lo,
from x) s
where hi in (lo-1,lo+1);
QUERY PLAN
-------------------------------------------------------------------------------------------------
Aggregate (cost=2144.02..2144.03 rows=1 width=4)
-> Subquery Scan s (cost=1643.77..2143.77 rows=100 width=4)
Filter: ((s.hi = (s.lo - 1)) OR (s.hi = (s.lo + 1)))
-> WindowAgg (cost=1643.77..1943.77 rows=10000 width=4)
-> WindowAgg (cost=1643.77..1818.77 rows=10000 width=4)
-> Sort (cost=1643.77..1668.77 rows=10000 width=4)
Sort Key: x.a
-> WindowAgg (cost=804.39..979.39
rows=10000 width=4)
-> Sort (cost=804.39..829.39
rows=10000 width=4)
Sort Key: x.a
-> Seq Scan on x
(cost=0.00..140.00 rows=10000 width=4)
(11 rows)

Ben-Gan:

postgres=# explain select avg(a)
from (select a, row_number() over
(order by a) as r,
count(*) over () as rc
from x ) p
where r in ((rc+1)/2,(rc+2)/2) ;
QUERY PLAN
-------------------------------------------------------------------------------------
Aggregate (cost=1354.64..1354.65 rows=1 width=4)
-> Subquery Scan p (cost=804.39..1354.39 rows=100 width=4)
Filter: ((p.r = ((p.rc + 1) / 2)) OR (p.r = ((p.rc + 2) / 2)))
-> WindowAgg (cost=804.39..1104.39 rows=10000 width=4)
-> WindowAgg (cost=804.39..979.39 rows=10000 width=4)
-> Sort (cost=804.39..829.39 rows=10000 width=4)
Sort Key: x.a
-> Seq Scan on x (cost=0.00..140.00
rows=10000 width=4)
(8 rows)

but
postgres=# select avg(a) from (select a, row_number() over (order by
a) as r, count(*) over () as rc from x ) p where r in
((rc+1)/2,(rc+2)/2) ;
avg
-----------------------
5027.0000000000000000
(1 row)

Time: 179,310 ms

postgres=# select avg(a) from (select a, row_number() over (order by a
asc) as hi, row_number() over (order by a desc) as lo from x) s where
hi in (lo-1,lo+1);
avg
-----------------------
5027.0000000000000000
(1 row)

Time: 78,791 ms

When I checked diff, I found, so the problem is count() function.

count(*) over () is very slow. - maybe so this is standard aggregate?

Regards
Pavel Stehule

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2009-07-15 10:22:41 Re: Mysql.whynot or PG vs MySQL comparison table?
Previous Message Tsutomu Yamada 2009-07-15 09:20:30 Re: [PATCH] "could not reattach to shared memory" on Windows