Re: Some question about lazy subquery/procedures execution in SELECT ... ORDER BY... LIMIT N queries

From: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Some question about lazy subquery/procedures execution in SELECT ... ORDER BY... LIMIT N queries
Date: 2011-11-24 22:53:49
Message-ID: CAK-MWwT5+xaQhydZ0TZo_U+Q=Rb9C2H=m+f4LuVqxkk7BU3CVQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I understand that position.
However if assumption: " the definition of ORDER BY --- it happens after
computing the select list, according to the SQL standard"
is correct,
then plans like:

postgres=# EXPLAIN ANALYZE SELECT * from test order by _data limit 10
offset 1000;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2884.19..2913.03 rows=10 width=8) (actual time=3.584..3.620
rows=10 loops=1)
-> Index Scan using random_key on test (cost=0.00..2884190.16
rows=1000000 width=8) (actual time=0.103..3.354 rows=1010 loops=1)
Total runtime: 3.663 ms
(3 rows)
should not be used at all.

In realty I was bite by next scenario (that is simplified case):

postgres=# CREATE TABLE test as (select random() as _data from (select *
from generate_series(1,1000000)) as t);
SELECT 1000000
postgres=# CREATE INDEX random_key on test(_data);
CREATE INDEX
postgres=# analyze test;
ANALYZE
postgres=# set seq_page_cost to 1;
SET
postgres=# set random_page_cost to 4;
SET
postgres=# set effective_cache_size to '16MB';
SET

Now:
postgres=# EXPLAIN analyze SELECT *,(select pg_sleep(10)) from test order
by _data limit 10;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.01..28.85 rows=10 width=8) (actual
time=10001.132..10001.198 rows=10 loops=1)
InitPlan 1 (returns $0)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual
time=10001.076..10001.078 rows=1 loops=1)
-> Index Scan using random_key on test (cost=0.00..2884190.16
rows=1000000 width=8) (actual time=10001.129..10001.188 rows=10 loops=1)
Total runtime: 10001.252 ms
(5 rows)

Is ok.

postgres=# EXPLAIN analyze SELECT *,(select pg_sleep(10)) from test order
by _data limit 10 offset 10000;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=28841.91..28870.76 rows=10 width=8) (actual
time=10037.850..10037.871 rows=10 loops=1)
InitPlan 1 (returns $0)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual
time=10001.040..10001.041 rows=1 loops=1)
-> Index Scan using random_key on test (cost=0.00..2884190.16
rows=1000000 width=8) (actual time=10001.094..10036.022 rows=10010 loops=1)
Total runtime: 10037.919 ms
(5 rows)

Is still ok.

postgres=# EXPLAIN SELECT *,(select pg_sleep(10)) from test order by _data
limit 10 offset 100000;
QUERY PLAN
--------------------------------------------------------------------------
Limit (cost=102723.94..102723.96 rows=10 width=8)
InitPlan 1 (returns $0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Sort (cost=102473.92..104973.92 rows=1000000 width=8)
Sort Key: _data
-> Seq Scan on test (cost=0.00..14425.00 rows=1000000 width=8)
(6 rows)

Ooops, there project screwed.

And it is not possible to predict in advance where and when you get hit by
that problem.
E.g. all usually fast statements with some arguments become slow as a snail
once DB switch from index scan to top node sort.

Only way prevent that is always write all queries way you suggested.

Kind Regards,
Maksym

On Fri, Nov 25, 2011 at 4:05 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com> writes:
> > Is here any reason why Postgresql calculates subqueries/storable
> procedures
> > in select list before applying ORDER BY / LIMIT?
>
> Well, that's the definition of ORDER BY --- it happens after computing
> the select list, according to the SQL standard. We try to optimize this
> in some cases but you can't really complain when we don't. Consider
> putting the expensive function outside the ORDER BY/LIMIT, ie
>
> select ..., expensive_fn() from (select ... order by ... limit ...) ss;
>
> regards, tom lane
>

--
Maxim Boguk
Senior Postgresql DBA.

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

Skype: maxim.boguk
Jabber: maxim(dot)boguk(at)gmail(dot)com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
If they can send one man to the moon... why can't they send them all?

МойКруг: http://mboguk.moikrug.ru/
Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не
все.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Craig Ringer 2011-11-27 22:50:59 Re: Some question about lazy subquery/procedures execution in SELECT ... ORDER BY... LIMIT N queries
Previous Message Cédric Villemain 2011-11-24 19:24:33 Re: query uses index but takes too much time?