Re: Some question about lazy subquery/procedures execution in SELECT ... ORDER BY... LIMIT N queries

From: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Some question about lazy subquery/procedures execution in SELECT ... ORDER BY... LIMIT N queries
Date: 2011-11-28 00:05:57
Message-ID: CAK-MWwQMx11HP9capawJtzaY1h-HriTAYdmgzPmem=UDqajdcA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Nov 28, 2011 at 9:50 AM, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au> wrote:

> On 11/25/2011 06:53 AM, Maxim Boguk wrote:
>
>> I understand that position.
>> However if assumption: " the definition of ORDER BY --- it happens after
>> computing the select list, according to the SQL standard"
>> is correct,
>> then plans like:
>>
>> postgres=# EXPLAIN ANALYZE SELECT * from test order by _data limit 10
>> offset 1000;
>> QUERY PLAN
>> ------------------------------**------------------------------**
>> ------------------------------**------------------------------**
>> --------------
>> Limit (cost=2884.19..2913.03 rows=10 width=8) (actual
>> time=3.584..3.620 rows=10 loops=1)
>> -> Index Scan using random_key on test (cost=0.00..2884190.16
>> rows=1000000 width=8) (actual time=0.103..3.354 rows=1010 loops=1)
>> Total runtime: 3.663 ms
>> (3 rows)
>> should not be used at all.
>>
>
>
> `LIMIT' and `OFFSET' are explicitly defined to compute only that part of
> the SELECT list that is required. If they weren't specifically defined with
> that exception then you'd be right.
>
> LIMIT and OFFSET aren't standard anyway, so Pg can define them to mean
> whatever is most appropriate. The SQL standard is adding new and (as usual)
> painfully clumsily worded features that work like LIMIT and OFFSET, but I
> don't know whether they have the same rules about whether execution of
> functions can be skipped or not.
>
>
> And it is not possible to predict in advance where and when you get hit
>> by that problem.
>>
>
> That's the biggest problem with statistics- and heuristics-based query
> planners in general, but this does seem to be a particularly difficult case.
>
> Setting a cost on the function call that more accurately reflects how
> expensive it is so PostgreSQL will work harder to avoid calling it might
> help. See http://www.postgresql.org/**docs/current/static/sql-**
> createfunction.html<http://www.postgresql.org/docs/current/static/sql-createfunction.html>.
>
> --
> Craig Ringer
>

Change cost for the functions in that case simple ignored by
planner/executor.

I think it should be possible always delay execution functions/subqueries
unrelated to order by list untill limit/offset were applied (even in the
worst case that will provide same performance as today), and no heuristics
need at all.

Hm, one more idea: lets say I call the next sql query -
'SELECT ...,very_log_sure_toasted_field FROM ... ORDER BY (something but
not very_log_toasted_field) LIMIT N'
which will use sort as top node.

Is detoasting of very_log_sure_toasted_field will be performed after
applying ORDER BY... LIMIT N or before it?

If detoasting performed before applying order by/limit, than there exists
large class of queries where delayed/lazy detoasting can be huge
performance win.
If detoasting performed after applying order by/limit, than the same
mechanics can be used to delay subquery/function execution.

PS: Yes I know good response to my complaints: 'patch welcome', but I only
started study of postgresql source code and recovering my C coding skills.
Unfortunately, I don't think I will be ready to start hacking
planner/executor code in short future (planner/executor is most complicated
and easiest to break part of the postgresql code, that is definitely not
newbie task).

--
Maxim Boguk

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Joost Kraaijeveld 2011-11-28 16:42:06 PostgreSQL 9.1 : why is this query slow?
Previous Message Craig Ringer 2011-11-27 22:50:59 Re: Some question about lazy subquery/procedures execution in SELECT ... ORDER BY... LIMIT N queries