Re: NOT IN subquery optimization

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: "Li, Zheng" <zhelli(at)amazon(dot)com>
Cc: "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: NOT IN subquery optimization
Date: 2019-02-25 23:19:57
Message-ID: CAKJS1f9-FsK74UVwOt21V4qBsC1oFC6LLQu6-CeddHjVNKdAbA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 26 Feb 2019 at 11:51, Li, Zheng <zhelli(at)amazon(dot)com> wrote:
> Resend the patch with a whitespace removed so that "git apply patch" works directly.

I had a quick look at this and it seems to be broken for the empty
table case I mentioned up thread.

Quick example:

Setup:

create table t1 (a int);
create table t2 (a int not null);
insert into t1 values(NULL),(1),(2);

select * from t1 where a not in(select a from t2);

Patched:
a
---
1
2
(2 rows)

Master:
a
---

1
2
(3 rows)

This will be due to the fact you're adding an a IS NOT NULL qual to
the scan of a:

postgres=# explain select * from t1 where a not in(select a from t2);
QUERY PLAN
------------------------------------------------------------------
Hash Anti Join (cost=67.38..152.18 rows=1268 width=4)
Hash Cond: (t1.a = t2.a)
-> Seq Scan on t1 (cost=0.00..35.50 rows=2537 width=4)
Filter: (a IS NOT NULL)
-> Hash (cost=35.50..35.50 rows=2550 width=4)
-> Seq Scan on t2 (cost=0.00..35.50 rows=2550 width=4)
(6 rows)

but as I mentioned, you can't do that as t2 might be empty and there's
no way to know that during planning.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Paul Ramsey 2019-02-25 23:39:49 Re: Allowing extensions to supply operator-/function-specific info
Previous Message Tom Lane 2019-02-25 23:19:43 Re: POC: converting Lists into arrays