Bad mis-costing of Merge Left Join in 8.0.1

From: Andrew - Supernews <andrew+nonews(at)supernews(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Bad mis-costing of Merge Left Join in 8.0.1
Date: 2005-04-03 20:38:20
Message-ID: slrnd50l1s.2ilg.andrew+nonews@trinity.supernews.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

This came up from a user in the IRC channel; while examining his EXPLAIN
ANALYZE output, we found some rather radical discrepancies in the costs
for a merge join, which was resulting in very suboptimal plans. I was
able to reduce his data to a test case as follows:

create table mjtest1 (id integer primary key, value1 integer,
padding float8[]);
create table mjtest2 (id integer primary key, value2 integer);
create index mjtest1_idx on mjtest1(value1);
insert into mjtest1
select i,i%100,
ARRAY[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
from generate_series(1,80000) as s(i);
insert into mjtest2 select i,i from generate_series(1,10) as s(i);
analyze;
set enable_mergejoin=true;
explain analyze select * from mjtest1 left join mjtest2 using (id)
where mjtest1.value1 = 5;
set enable_mergejoin=false;
explain analyze select * from mjtest1 left join mjtest2 using (id)
where mjtest1.value1 = 5;

On my 8.0.1 test setup, I get the following plans:

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Merge Left Join (cost=1.27..6.98 rows=779 width=276) (actual time=0.318..114.686 rows=800 loops=1)
Merge Cond: ("outer".id = "inner".id)
-> Index Scan using mjtest1_pkey on mjtest1 (cost=0.00..4402.00 rows=779 width=272) (actual time=0.103..103.150 rows=800 loops=1)
Filter: (value1 = 5)
-> Sort (cost=1.27..1.29 rows=10 width=8) (actual time=0.144..0.185 rows=10 loops=1)
Sort Key: mjtest2.id
-> Seq Scan on mjtest2 (cost=0.00..1.10 rows=10 width=8) (actual time=0.019..0.068 rows=10 loops=1)
Total runtime: 118.181 ms
(8 rows)

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Hash Left Join (cost=1.12..2785.16 rows=779 width=276) (actual time=0.314..15.002 rows=800 loops=1)
Hash Cond: ("outer".id = "inner".id)
-> Index Scan using mjtest1_idx on mjtest1 (cost=0.00..2780.13 rows=779 width=272) (actual time=0.027..6.032 rows=800 loops=1)
Index Cond: (value1 = 5)
-> Hash (cost=1.10..1.10 rows=10 width=8) (actual time=0.122..0.122 rows=0 loops=1)
-> Seq Scan on mjtest2 (cost=0.00..1.10 rows=10 width=8) (actual time=0.018..0.065 rows=10 loops=1)
Total runtime: 18.415 ms
(7 rows)

The cost for the Merge Left Join is clearly preposterous, since the join
cost can't be lower than the cost of the left branch, as it is an outer
join and therefore that branch must be run to completion. I do not fully
understand the cost estimation code for the merge join, but it appears to
be reducing its total cost estimate below that of the child nodes on the
assumption that the join can be aborted early, which is clearly not the
case for outer joins.

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2005-04-04 01:10:15 Re: Bad mis-costing of Merge Left Join in 8.0.1
Previous Message Magnus Hagander 2005-04-03 17:13:26 Re: windows installation