Quick Links

Correlated IN/Any Subquery Transformation

From:	"Li, Zheng" <zhelli(at)amazon(dot)com>
To:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Correlated IN/Any Subquery Transformation
Date:	2020-03-20 16:05:59
Message-ID:	9C6BEBC3-C844-4DBC-8864-B0BFE265AEB5@amazon.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi PGHackers:

Currently, correlated IN/Any subquery always gets planned as a SubPlan which leads to poor performance:
postgres=# explain (costs off) select count(*) from s where s.n in (select l.n from l where l.u != s.u);
QUERY PLAN
------------------------------------
Aggregate
-> Seq Scan on s
Filter: (SubPlan 1)
SubPlan 1
-> Seq Scan on l
Filter: (u <> s.u)

postgres=# select count() from s where s.n in (select l.n from l where l.u != s.u);
Time: 3419.466 ms (00:03.419)
However, you can rewrite the query using exists which will be executed using join. In this example the join plan is more than 3 orders of magnitudes faster than the SubPlan:
postgres=# explain (costs off) select count(*) from s where exists (select 1 from l where l.n = s.n and l.u != s.u);
QUERY PLAN
---------------------------------------
Aggregate
-> Merge Semi Join
Merge Cond: (s.n = l.n)
Join Filter: (l.u <> s.u)
-> Index Scan using s_n on s
-> Index Scan using l_n on l

postgres=# select count() from s where exists (select 1 from l where l.n = s.n and l.u != s.u);
Time: 1.188 ms

Table s has 10 rows, table l has 1, 000, 000 rows.

This patch enables correlated IN/Any subquery to be transformed to join, the transformation is allowed only when the correlated Var is in the where clause of the subquery. It covers the most common correlated cases and follows the same criteria that is followed by the correlated Exists transformation code.

Here is the new query plan for the same correlated IN query:
postgres=# explain (costs off) select count(*) from s where s.n in (select l.n from l where l.u != s.u);
QUERY PLAN

Aggregate
-> Merge Semi Join
Merge Cond: (s.n = l.n)
Join Filter: (l.u <> s.u)
-> Index Scan using s_n on s
-> Index Scan using l_n on l

postgres=# select count(*) from s where s.n in (select l.n from l where l.u != s.u);
Time: 1.693 ms
________________________________
Also the patch introduces a new GUC enable_correlated_any_transform (on by default) to guard the optimization. Test cases are included in the patch. Comments are welcome!

-----------
Zheng Li
AWS, Amazon Aurora PostgreSQL

Attachment	Content-Type	Size
correlated_any_transformation.patch	application/octet-stream	47.7 KB

Responses

Re: Correlated IN/Any Subquery Transformation at 2020-03-20 16:42:42 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2020-03-20 16:36:07	Re: where EXEC_BACKEND is defined
Previous Message	Oleksii Kliukin	2020-03-20 16:02:15	Issues with building cpp extensions on PostgreSQL 10+