Skip site navigation (1) Skip section navigation (2)

PostgreSQL NOT IN performance

From: Віталій Тимчишин <tivv00(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: PostgreSQL NOT IN performance
Date: 2008-11-19 11:51:47
Message-ID: 331e40660811190351n6bde6caic5fd8cc3a0f777d2@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-performance
Hello.

It's second query rewrite postgresql seems not to handle - making EXCEPT
from NOT IT.
Here is an example:
Preparation:

drop table if exists t1;
drop table if exists t2;
create temporary table t1(id) as
select
(random()*100000)::int from generate_series(1,200000) a(id);

create temporary table t2(id) as
select
(random()*100000)::int from generate_series(1,100000) a(id);
analyze t1;
analyze t2;

Query 1:
select * from t1 where id not in (select id from t2);
Plan:
"Seq Scan on t1  (cost=1934.00..164105319.00 rows=100000 width=4)"
"  Filter: (NOT (subplan))"
"  SubPlan"
"    ->  Materialize  (cost=1934.00..3325.00 rows=100000 width=4)"
"          ->  Seq Scan on t2  (cost=0.00..1443.00 rows=100000 width=4)"

Query 2 (gives same result as Q1):
select * from t1 except all (select id from t2);
Plan:
"SetOp Except All  (cost=38721.90..40221.90 rows=30000 width=4)"
"  ->  Sort  (cost=38721.90..39471.90 rows=300000 width=4)"
"        Sort Key: "*SELECT* 1".id"
"        ->  Append  (cost=0.00..7328.00 rows=300000 width=4)"
"              ->  Subquery Scan "*SELECT* 1"  (cost=0.00..4885.00
rows=200000 width=4)"
"                    ->  Seq Scan on t1  (cost=0.00..2885.00 rows=200000
width=4)"
"              ->  Subquery Scan "*SELECT* 2"  (cost=0.00..2443.00
rows=100000 width=4)"
"                    ->  Seq Scan on t2  (cost=0.00..1443.00 rows=100000
width=4)"

If I am correct, planner simply do not know that he can rewrite NOT IN as
"EXCEPT ALL" operator, so all NOT INs when list of values to remove is long
takes very much time.
So the question is: I am willing to participate in postgresql development
because it may be easier to fix planner then to rewrite all my queries :).
How can I? (I mean to work on query planner enhancements by providing new
options of query rewrite, not to work on other thing nor on enhancing
planner in other ways, like better estimations of known plans).

Responses

pgsql-performance by date

Next:From: DANIEL CRISTIAN CRUZDate: 2008-11-19 12:22:06
Subject: Re: PostgreSQL NOT IN performance
Previous:From: Tomas VondraDate: 2008-11-19 00:49:12
Subject: Re: Performance and IN clauses

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group