Optimization idea

From: Vlad Arkhipov <arhipov(at)dc(dot)baikal(dot)ru>
To: pgsql-performance(at)postgresql(dot)org
Subject: Optimization idea
Date: 2010-04-22 09:25:46
Message-ID: 4BD0161A.4010908@dc.baikal.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Please do this small optimization if it is possible. It seem that the
optimizer have the all information to create a fast plan but it does not
do that.

create temp table t1 (id bigint, t bigint);

insert into t1 values (1, 1);
insert into t1 values (2, 2);
insert into t1 values (2, 3);
insert into t1 values (2, 4);
insert into t1 values (3, 5);

create temp table t2 (id bigint, t bigint);

insert into t2 (id, t)
select g, 2
from generate_series(1, 200) g;

insert into t2 (id, t)
select g, 3
from generate_series(201, 300) g;

insert into t2 (id, t)
select g, 4
from generate_series(301, 400) g;

insert into t2 (id, t)
select g, 1
from generate_series(401, 100000) g;

insert into t2 (id, t)
select g, 5
from generate_series(100001, 100100) g;

create index t_idx on t2(t);

analyze t1;
analyze t2;

explain analyze
select *
from t2
join t1 on t1.t = t2.t
where t1.t = 2

explain analyze
select *
from t2
join t1 on t1.t = t2.t
where t1.id = 3

explain analyze
select *
from t2
where t2.t in (2, 3, 4)

These two queries are completely equal and optimizator should know it as
I see from the plans:

"Hash Join (cost=1.09..2667.09 rows=75000 width=32) (actual
time=0.026..100.207 rows=400 loops=1)"
" Hash Cond: (t2.t = t1.t)"
" -> Seq Scan on t2 (cost=0.00..1541.00 rows=100000 width=16) (actual
time=0.007..47.083 rows=100000 loops=1)"
" -> Hash (cost=1.05..1.05 rows=3 width=16) (actual time=0.011..0.011
rows=3 loops=1)"
" -> Seq Scan on t1 (cost=0.00..1.05 rows=3 width=16) (actual
time=0.005..0.008 rows=3 loops=1)" <-- HERE IS THE PROBLEM. IF THE
ESTIMATED COUNT = 1 OPTIMIZER BUILDS THE CORRECT FAST PLAN, BUT IF THE
ESTIMATION IS GREATER THAN 1 WE HAVE A PROBLEM
" Filter: (id = 2)"
"Total runtime: 100.417 ms"

"Nested Loop (cost=0.00..1024.46 rows=20020 width=32) (actual
time=0.030..0.222 rows=100 loops=1)"
" -> Seq Scan on t1 (cost=0.00..1.05 rows=1 width=16) (actual
time=0.008..0.009 rows=1 loops=1)"
" Filter: (id = 3)"
" -> Index Scan using t_idx on t2 (cost=0.00..773.16 rows=20020
width=16) (actual time=0.016..0.078 rows=100 loops=1)"
" Index Cond: (t2.t = t1.t)"
"Total runtime: 0.296 ms"

"Bitmap Heap Scan on t2 (cost=16.09..556.80 rows=429 width=16) (actual
time=0.067..0.256 rows=400 loops=1)"
" Recheck Cond: (t = ANY ('{2,3,4}'::bigint[]))"
" -> Bitmap Index Scan on t_idx (cost=0.00..15.98 rows=429 width=0)
(actual time=0.056..0.056 rows=400 loops=1)"
" Index Cond: (t = ANY ('{2,3,4}'::bigint[]))"
"Total runtime: 0.458 ms"

An ugly workaround is to add the column t1(t) in the table t2.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2010-04-22 12:14:42 Re: Replacing Cursors with Temporary Tables
Previous Message Scott Carey 2010-04-22 03:13:16 Re: Replacing Cursors with Temporary Tables