Subpar Execution Plan

From: Jens-Wolfhard Schicke <drahflow(at)gmx(dot)de>
To: pgsql-performance(at)postgresql(dot)org
Subject: Subpar Execution Plan
Date: 2007-11-06 20:48:12
Message-ID: 4730D30C.3090606@gmx.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I have a two relevant tables:

fastgraph=# \d object
Table "public.object"
Column | Type | Modifiers
- --------+--------+----------------------------------------------
id | bigint | not null default nextval('id_seq'::regclass)
Indexes:
"object_id_idx" UNIQUE, btree (id)

actually, this table is partitioned into object, link, representation and format, the latter three of which carry some extra fields,
which are not selected this time. "id" is indexed in every one.

fastgraph=# \d link
Table "public.link"
Column | Type | Modifiers
- -----------+------------------+----------------------------------------------
id | bigint | not null default nextval('id_seq'::regclass)
s | bigint | not null
e | bigint | not null
intensity | double precision | not null
Indexes:
"link_id_idx" UNIQUE, btree (id)
"link_e_idx" btree (e)
"link_s_idx" btree (s)
"link_se_idx" btree (s, e)
Inherits: object

And the query

fastgraph=# explain
select distinct o.*
from object o, link l1, link l2
where (o.id = l1.s and l2.s = l1.e and l2.e = 8693)
or (o.id = l1.e and l2.s = l1.s and l2.e = 8693)
or (o.id = l1.s and l2.e = l1.e and l2.s = 8693)
or (o.id = l1.e and l2.e = l1.s and l2.s = 8693);
QUERY PLAN
- -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=168076109270.14..168119432747.82 rows=200 width=8)
-> Sort (cost=168076109270.14..168097771008.98 rows=8664695536 width=8)
Sort Key: o.id
-> Nested Loop (cost=12.74..166290504976.97 rows=8664695536 width=8)
Join Filter: (((o.id = l1.s) AND (l2.s = l1.e) AND (l2.e = 8693)) OR ((o.id = l1.e) AND (l2.s = l1.s) AND (l2.e = 8693)) OR ((o.id = l1.s) AND (l2.e = l1.e) AND (l2.s = 8693)) OR ((o.id = l1.e) AND (l2.e = l1.s) AND (l2.s = 8693)))
-> Nested Loop (cost=0.00..270060645.64 rows=9889858512 width=24)
-> Seq Scan on link l2 (cost=0.00..120087.40 rows=1908 width=16)
Filter: ((e = 8693) OR (e = 8693) OR (s = 8693) OR (s = 8693))
-> Append (cost=0.00..89644.64 rows=5183364 width=8)
-> Seq Scan on object o (cost=0.00..4584.51 rows=317751 width=8)
-> Seq Scan on link o (cost=0.00..76189.70 rows=4389770 width=8)
-> Seq Scan on format o (cost=0.00..1.02 rows=2 width=8)
-> Seq Scan on representation o (cost=0.00..8869.41 rows=475841 width=8)
-> Bitmap Heap Scan on link l1 (cost=12.74..16.75 rows=1 width=16)
Recheck Cond: (((o.id = l1.s) AND (l2.s = l1.e)) OR ((l2.s = l1.s) AND (o.id = l1.e)) OR ((o.id = l1.s) AND (l2.e = l1.e)) OR ((l2.e = l1.s) AND (o.id = l1.e)))
-> BitmapOr (cost=12.74..12.74 rows=1 width=0)
-> Bitmap Index Scan on link_se_idx (cost=0.00..3.18 rows=1 width=0)
Index Cond: ((o.id = l1.s) AND (l2.s = l1.e))
-> Bitmap Index Scan on link_se_idx (cost=0.00..3.18 rows=1 width=0)
Index Cond: ((l2.s = l1.s) AND (o.id = l1.e))
-> Bitmap Index Scan on link_se_idx (cost=0.00..3.18 rows=1 width=0)
Index Cond: ((o.id = l1.s) AND (l2.e = l1.e))
-> Bitmap Index Scan on link_se_idx (cost=0.00..3.18 rows=1 width=0)
Index Cond: ((l2.e = l1.s) AND (o.id = l1.e))
(24 rows)

These costs are unacceptable for my application. (obviously)

My question is just, why does the planner think it a good idea to join the unrelated
tables first using a nested loop over a sequence scan? That seems like the worst
selectivity possible. It seems like a bug to me, but maybe I am just overlooking something...

Can the planner distribute the DISTINCT down the join tree somehow?

Just for reference the following query seems equivalent:

fastgraph=# explain select distinct o.* from object o where o.id in (
select l1.s from link l1 where l1.e in (
select l2.s from link l2 where l2.e = 8693
union
select l2.e from link l2 where l2.s = 8693
)
union
select l1.e from link l1 where l1.s in (
select l2.s from link l2 where l2.e = 8693
union
select l2.e from link l2 where l2.s = 8693
)
)
;

QUERY PLAN
- -------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=358267.90..2698389.18 rows=200 width=8)
-> Nested Loop (cost=358267.90..2685430.77 rows=5183364 width=8)
Join Filter: (o.id = l1.s)
-> Unique (cost=358267.90..364484.89 rows=1243398 width=8)
-> Sort (cost=358267.90..361376.39 rows=1243398 width=8)
Sort Key: l1.s
-> Append (cost=2612.99..215396.61 rows=1243398 width=8)
-> Hash Join (cost=2612.99..105950.31 rows=1068599 width=8)
Hash Cond: (l1.e = l2.s)
-> Seq Scan on link l1 (cost=0.00..76189.70 rows=4389770 width=16)
-> Hash (cost=2601.06..2601.06 rows=954 width=8)
-> Unique (cost=2586.75..2591.52 rows=954 width=8)
-> Sort (cost=2586.75..2589.14 rows=954 width=8)
Sort Key: l2.s
-> Append (cost=0.00..2539.54 rows=954 width=8)
-> Index Scan using link_e_idx on link l2 (cost=0.00..1838.03 rows=773 width=8)
Index Cond: (e = 8693)
-> Bitmap Heap Scan on link l2 (cost=6.36..691.97 rows=181 width=8)
Recheck Cond: (s = 8693)
-> Bitmap Index Scan on link_s_idx (cost=0.00..6.31 rows=181 width=0)
Index Cond: (s = 8693)
-> Hash Join (cost=2612.99..97012.31 rows=174799 width=8)
Hash Cond: (l1.s = l2.s)
-> Seq Scan on link l1 (cost=0.00..76189.70 rows=4389770 width=16)
-> Hash (cost=2601.06..2601.06 rows=954 width=8)
-> Unique (cost=2586.75..2591.52 rows=954 width=8)
-> Sort (cost=2586.75..2589.14 rows=954 width=8)
Sort Key: l2.s
-> Append (cost=0.00..2539.54 rows=954 width=8)
-> Index Scan using link_e_idx on link l2 (cost=0.00..1838.03 rows=773 width=8)
Index Cond: (e = 8693)
-> Bitmap Heap Scan on link l2 (cost=6.36..691.97 rows=181 width=8)
Recheck Cond: (s = 8693)
-> Bitmap Index Scan on link_s_idx (cost=0.00..6.31 rows=181 width=0)
Index Cond: (s = 8693)
-> Append (cost=0.00..1.81 rows=4 width=8)
-> Index Scan using object_id_idx on object o (cost=0.00..0.31 rows=1 width=8)
Index Cond: (o.id = l1.s)
-> Index Scan using link_id_idx on link o (cost=0.00..0.89 rows=1 width=8)
Index Cond: (o.id = l1.s)
-> Index Scan using format_id_idx on format o (cost=0.00..0.27 rows=1 width=8)
Index Cond: (o.id = l1.s)
-> Index Scan using representation_id_idx on representation o (cost=0.00..0.34 rows=1 width=8)
Index Cond: (o.id = l1.s)
(44 rows)

Costs are better (and more practical).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHMNMLzhchXT4RR5ARAnhDAKCgSK/2SIb8mwDnjZgGxRtYdWJ+pwCgoEMW
zp8Mz52WeSZuNLpFGz8NPJI=
=FtZd
-----END PGP SIGNATURE-----

Browse pgsql-performance by date

  From Date Subject
Next Message Guillaume Smet 2007-11-07 12:53:16 Estimation problem with a LIKE clause containing a /
Previous Message Greg Smith 2007-11-06 18:10:10 Re: dell versus hp