| From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
|---|---|
| To: | David Geier <geidav(dot)pg(at)gmail(dot)com> |
| Cc: | Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Convert NOT IN sublinks to anti-joins when safe |
| Date: | 2026-03-04 09:33:24 |
| Message-ID: | CAMbWs4-EJrNijaS9=KqA0m3hWRQ6ZcCkd75etAGW_SZoPpfEXg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Mon, Mar 2, 2026 at 9:50 PM David Geier <geidav(dot)pg(at)gmail(dot)com> wrote:
> The very last rewrite combines both cases. The rewritten query then
> looks like:
>
> SELECT t1.c1 FROM T1 WHERE
> t1.c1 IS NOT NULL AND
> NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1) AND
> NOT EXISTS (SELECT 1 FROM t2 WHERE t2.c1 IS NULL)
I'm still not convinced this rewrite is correct. As I mentioned
earlier, it breaks down if t2 is empty while t1 contains NULL rows.
For example:
CREATE TABLE t1 (c1 int);
CREATE TABLE t2 (c1 int);
INSERT INTO t1 VALUES (1), (NULL);
SELECT t1.c1 FROM t1 WHERE t1.c1 NOT IN (SELECT t2.c1 FROM t2);
c1
----
1
(2 rows)
SELECT t1.c1 FROM T1 WHERE
t1.c1 IS NOT NULL AND
NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1) AND
NOT EXISTS (SELECT 1 FROM t2 WHERE t2.c1 IS NULL);
c1
----
1
(1 row)
> Seems reasonable to start with the non-NULL variant, though there are
> certainly cases where there's no PK / unique index on the relevant columns.
Yeah. I don't know how to optimize nullable NOT IN clauses. It seems
quite difficult to handle safely purely via query transformations.
Maybe we can explore adding a dedicated Null-Aware Anti-Join execution
node, much like Oracle's approach. But that is definitely beyond the
scope of this current patch.
- Richard
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Álvaro Herrera | 2026-03-04 09:45:16 | Re: [PATCH] Add pg_get_database_ddl() function to reconstruct CREATE DATABASE statement |
| Previous Message | Japin Li | 2026-03-04 09:30:34 | Re: [PATCH] Add pg_get_database_ddl() function to reconstruct CREATE DATABASE statement |