Some question about lazy subquery/procedures execution in SELECT ... ORDER BY... LIMIT N queries

From: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Some question about lazy subquery/procedures execution in SELECT ... ORDER BY... LIMIT N queries
Date: 2011-11-24 02:56:58
Message-ID: CAK-MWwTasteHbxcSN69jKfBbXJmxxUrtH7xL-jaBjEj3_2xfaQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Is here any reason why Postgresql calculates subqueries/storable procedures
in select list before applying ORDER BY / LIMIT?

I talking about cases like:

SELECT *,
(some very slow subquery or slow storable stable/immutable procedure like
xml processing)
FROM
some_table
ORDER BY
some_field (unrelated to subquery results)
LIMIT N
?

I seen cases where that lead to 3-6 orders of slowdown.

Simpliest test case:

CREATE TABLE test (id integer);
INSERT INTO test SELECT * FROM generate_series(1,1000);

Slow query (note LOOPS=1000 around subplan):
EXPLAIN ANALYZE select id,(select count(*) from test t1 where t1.id=t.id)
from test t order by id limit 10;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Limit (cost=13044.61..13044.63 rows=10 width=4) (actual
time=158.636..158.641 rows=10 loops=1)
-> Sort (cost=13044.61..13047.11 rows=1000 width=4) (actual
time=158.636..158.639 rows=10 loops=1)
Sort Key: t.id
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on test t (cost=0.00..13023.00 rows=1000 width=4)
(actual time=0.188..158.242 rows=1000 loops=1)
SubPlan 1
-> Aggregate (cost=13.00..13.01 rows=1 width=0) (actual
time=0.157..0.157 rows=1 loops=1000)
-> Seq Scan on test t1 (cost=0.00..13.00 rows=1
width=0) (actual time=0.081..0.156 rows=1 loops=1000)
Filter: (id = t.id)
Total runtime: 158.676 ms

Fast query:
EXPLAIN ANALYZE select id,(select count(*) from test t1 where t1.id=t.id)
from (select id from test order by id limit 10) as t order by id;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Subquery Scan on t (cost=32.11..162.36 rows=10 width=4) (actual
time=1.366..4.770 rows=10 loops=1)
-> Limit (cost=32.11..32.13 rows=10 width=4) (actual time=0.971..0.983
rows=10 loops=1)
-> Sort (cost=32.11..34.61 rows=1000 width=4) (actual
time=0.970..0.975 rows=10 loops=1)
Sort Key: test.id
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on test (cost=0.00..10.50 rows=1000 width=4)
(actual time=0.027..0.455 rows=1000 loops=1)
SubPlan 1
-> Aggregate (cost=13.00..13.01 rows=1 width=0) (actual
time=0.375..0.375 rows=1 loops=10)
-> Seq Scan on test t1 (cost=0.00..13.00 rows=1 width=0)
(actual time=0.017..0.371 rows=1 loops=10)
Filter: (id = t.id)
Total runtime: 4.845 ms

Using second way is reasonable workaround for sure, but half year ago I
happen to meet project where I was forced ask developers to rewrite huge
pile of analitical queries on that way
to get reasonable performance (and there was a lot outcry and complaints in
the process).

And ofcourse there is not always possible to create additional indexes so
query will be go through index scan/backward indexscan instead of
sort/limit in the top level.

Regards,
Maksym

--
Maxim Boguk
Senior Postgresql DBA.

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

Skype: maxim.boguk
Jabber: maxim(dot)boguk(at)gmail(dot)com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
If they can send one man to the moon... why can't they send them all?

МойКруг: http://mboguk.moikrug.ru/
Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не
все.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2011-11-24 03:06:10 Re: Seq Scan used instead of Index Scan
Previous Message Mark Kirkwood 2011-11-24 01:30:01 Re: Seq Scan used instead of Index Scan