Re: Optimizer questions

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Optimizer questions
Date: 2016-01-29 22:01:34
Message-ID: CAPpHfdtUe1Ow_nMyv5PaiO7K9o5T+-KYg5cMtQZQrQ8zuseOaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 8, 2016 at 11:58 AM, Konstantin Knizhnik <
k(dot)knizhnik(at)postgrespro(dot)ru> wrote:

> Attached please find improved version of the optimizer patch for LIMIT
> clause.
> Now I try to apply this optimization only for non-trivial columns
> requiring evaluation.
> May be it will be better to take in account cost of this columns
> evaluation but right now I just detect non-variable columns.
>

We may think about more general feature: LIMIT pushdown. In the
Konstantin's patch planner push LIMIT before tlist calculation.
But there are other cases when calculating LIMIT first would be beneficial.
For instance, we can do LIMIT before JOINs. That is possible only for LEFT
JOIN which is not used in filter and ordering clauses. See the example
below.

# create table t1 as (select i, random() v from generate_series(1,1000000)
i);
SELECT 1000000

# create table t2 as (select i, random() v from generate_series(1,1000000)
i);
SELECT 1000000

# explain analyze select * from t1 left join t2 on t1.i = t2.i order by
t1.v limit 10;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Limit (cost=87421.64..87421.67 rows=10 width=24) (actual
time=1486.276..1486.278 rows=10 loops=1)
-> Sort (cost=87421.64..89921.64 rows=1000000 width=24) (actual
time=1486.275..1486.275 rows=10 loops=1)
Sort Key: t1.v
Sort Method: top-N heapsort Memory: 25kB
-> Hash Left Join (cost=27906.00..65812.00 rows=1000000
width=24) (actual time=226.180..1366.238 rows=1000000
Hash Cond: (t1.i = t2.i)
-> Seq Scan on t1 (cost=0.00..15406.00 rows=1000000
width=12) (actual time=0.010..77.041 rows=1000000 l
-> Hash (cost=15406.00..15406.00 rows=1000000 width=12)
(actual time=226.066..226.066 rows=1000000 loop
Buckets: 131072 Batches: 1 Memory Usage: 46875kB
-> Seq Scan on t2 (cost=0.00..15406.00 rows=1000000
width=12) (actual time=0.007..89.002 rows=100
Planning time: 0.123 ms
Execution time: 1492.118 ms
(12 rows)

# explain analyze select * from (select * from t1 order by t1.v limit 10)
t1 left join t2 on t1.i = t2.i;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Hash Right Join (cost=37015.89..56171.99 rows=10 width=24) (actual
time=198.478..301.278 rows=10 loops=1)
Hash Cond: (t2.i = t1.i)
-> Seq Scan on t2 (cost=0.00..15406.00 rows=1000000 width=12) (actual
time=0.005..74.303 rows=1000000 loops=1)
-> Hash (cost=37015.77..37015.77 rows=10 width=12) (actual
time=153.584..153.584 rows=10 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Limit (cost=37015.64..37015.67 rows=10 width=12) (actual
time=153.579..153.580 rows=10 loops=1)
-> Sort (cost=37015.64..39515.64 rows=1000000 width=12)
(actual time=153.578..153.578 rows=10 loops=1)
Sort Key: t1.v
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on t1 (cost=0.00..15406.00 rows=1000000
width=12) (actual time=0.012..78.828 rows=100
Planning time: 0.132 ms
Execution time: 301.308 ms
(12 rows)

In this example LIMIT pushdown makes query 5 times faster. It would be very
nice if optimizer make this automatically.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-01-29 22:24:07 Re: Sequence Access Method WIP
Previous Message Merlin Moncure 2016-01-29 21:35:52 Re: Move PinBuffer and UnpinBuffer to atomics