join ordering

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: join ordering
Date: 2009-04-13 19:12:26
Message-ID: 603c8f070904131212r5fa25e97w985c0646a48d26d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I have a query that performs poorly which can be simplified to the
following test case (v8.3.6).

CREATE TABLE foo (id integer, primary key (id));
INSERT INTO foo SELECT generate_series(1,10);
CREATE TABLE bar (id integer, foo_id integer not null references foo (id),
PRIMARY KEY (id));
INSERT INTO bar SELECT g, g % 10 + 1 FROM generate_series(1,10000) g;

ANALYZE;

EXPLAIN ANALYZE
SELECT v.id FROM (VALUES (1, 1)) v (id, bar_id)
LEFT JOIN (bar JOIN foo ON bar.foo_id = foo.id) ON v.bar_id = bar.id;

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=1.23..408.74 rows=1 width=4) (actual
time=0.405..63.585 rows=1 loops=1)
Join Filter: ("*VALUES*".column2 = bar.id)
-> Values Scan on "*VALUES*" (cost=0.00..0.01 rows=1 width=8)
(actual time=0.015..0.017 rows=1 loops=1)
-> Hash Join (cost=1.23..283.73 rows=10000 width=4) (actual
time=0.367..49.029 rows=10000 loops=1)
Hash Cond: (bar.foo_id = foo.id)
-> Seq Scan on bar (cost=0.00..145.00 rows=10000 width=8)
(actual time=0.042..15.562 rows=10000 loops=1)
-> Hash (cost=1.10..1.10 rows=10 width=4) (actual
time=0.143..0.143 rows=10 loops=1)
-> Seq Scan on foo (cost=0.00..1.10 rows=10 width=4)
(actual time=0.086..0.105 rows=10 loops=1)
Total runtime: 63.893 ms
(9 rows)

This isn't a very good plan. What we should do is first join the
values expression against bar, and then join the resulting rows
against foo. The optimizer doesn't want to do that, and I think the
reason is because it knows that the left join might introduce null
values into the result of (VALUES (...) LEFT JOIN bar) which would
then cause the join against foo to produce different results. But in
practice, since foo.id is not null and = is strict, it's equivalent to
the following, which the planner handles much better.

EXPLAIN ANALYZE
SELECT v.id FROM (VALUES (1, 1)) v (id, bar_id)
LEFT JOIN (bar LEFT JOIN foo ON bar.foo_id = foo.id) ON v.bar_id = bar.id;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..8.57 rows=1 width=4) (actual
time=0.079..0.150 rows=1 loops=1)
-> Nested Loop Left Join (cost=0.00..8.29 rows=1 width=8) (actual
time=0.058..0.120 rows=1 loops=1)
-> Values Scan on "*VALUES*" (cost=0.00..0.01 rows=1
width=8) (actual time=0.006..0.008 rows=1 loops=1)
-> Index Scan using bar_pkey on bar (cost=0.00..8.27 rows=1
width=8) (actual time=0.039..0.044 rows=1 loops=1)
Index Cond: ("*VALUES*".column2 = bar.id)
-> Index Scan using foo_pkey on foo (cost=0.00..0.27 rows=1
width=4) (actual time=0.012..0.015 rows=1 loops=1)
Index Cond: (bar.foo_id = foo.id)
Total runtime: 0.312 ms
(8 rows)

...Robert

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2009-04-13 19:14:47 Re: proposal: add columns created and altered to pg_proc and pg_class
Previous Message Kevin Grittner 2009-04-13 19:06:33 Re: Re: [BUGS] BUG #4027: backslash escapingnotdisabledinplpgsql