NOT IN Doesn't use Anti Joins?

From: Rod Taylor <rod(dot)taylor(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: NOT IN Doesn't use Anti Joins?
Date: 2009-12-17 14:02:57
Message-ID: 751261b20912170602m49c2711ckbb23df5fcf6b682c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I'm sure there is a good reason why NOT IN will not use an Anti-Join
plan equivalent to NOT EXISTS due to NULL handling, but in this
particular case the value being compared is in the PRIMARY KEY of both
structures being joined.

The NOT IN plan was killed after 10 minutes. The NOT EXISTS plan
returned data in roughly 10ms.

Is there a reason why the NOT IN plan could not use Anti-Joins when
the column being compared against is guaranteed to be NOT NULL? Too
much planner overhead to determine nullness of the column?

sk=# explain select * from source_reb_listing where listing_id not in
(select listing_id from source_reb_listing_specs) order by file_id
desc limit 5;

QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=729015.39..3420463.83 rows=5 width=28)
-> Index Scan Backward using source_reb_listing_fileid_idx on
source_reb_listing (cost=729015.39..169537219655.96 rows=314954
width=28)
Filter: (NOT (SubPlan 1))
SubPlan 1
-> Materialize (cost=729015.39..1185280.74 rows=32810035 width=8)
-> Seq Scan on source_reb_listing_specs
(cost=0.00..568040.35 rows=32810035 width=8)
(6 rows)

sk=# explain select * from source_reb_listing where not exists (select
* from source_reb_listing_specs as t where t.listing_id =
source_reb_listing.listing_id) order by file_id desc limit 5;
QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..35.31 rows=5 width=28)
-> Nested Loop Anti Join (cost=0.00..3880495.87 rows=549496 width=28)
-> Index Scan Backward using source_reb_listing_fileid_idx
on source_reb_listing (cost=0.00..1107142.20 rows=629907 width=28)
-> Index Scan using source_reb_listing_specs_pkey on
source_reb_listing_specs t (cost=0.00..1592.74 rows=408 width=8)
Index Cond: (t.listing_id = source_reb_listing.listing_id)
(5 rows)

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2009-12-17 14:16:59 Re: Largeobject Access Controls (r2460)
Previous Message Kevin Grittner 2009-12-17 13:38:37 Re: Update on true serializable techniques in MVCC