Pushing down join clauses into subqueries

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Pushing down join clauses into subqueries
Date: 2018-06-06 19:22:19
Message-ID: 7105c8f3-e16f-ee54-9ce9-c4ac2384791e@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

It would be nice, if we could push down join quals into subqueries. For
example:

create table small_table (i int);
create table big_table (i int, j int);

insert into small_table values (1), (2); -- two rows
insert into big_table select g/10, g from generate_series(1, 100000) g;
-- million rows

create index on big_table(i);

select *
from small_table,
(select i, count(*) from big_table group by i) as sq
where small_table.i = sq.i;

Currently, we will fully materialize the subquery, and then filter the
rows find the rows that match the outer query:

QUERY PLAN

-----------------------------------------------------------------------------------
Hash Join (cost=1949.50..2017.08 rows=2550 width=16)
Hash Cond: (small_table.i = big_table.i)
-> Seq Scan on small_table (cost=0.00..35.50 rows=2550 width=4)
-> Hash (cost=1947.00..1947.00 rows=200 width=12)
-> HashAggregate (cost=1943.00..1945.00 rows=200 width=12)
Group Key: big_table.i
-> Seq Scan on big_table (cost=0.00..1443.00
rows=100000 width=4)
(7 rows)

For this query, a nested loop join, with an index scan on big_table,
would be a better plan. Doing that would require creating parameterized
paths, planning the subquery multiple times, with and without the
pushed-down join quals.

However, there's one special case: What if the subquery is LATERAL? In
that case, the subquery is already parameterized. We could push down
join quals, which refer the same "other" relations that are already
laterally referenced, without making it any more lateral.

Attached is a patch to do that (the first patch). I'm not too familiar
with this code; does it look sane?

One thing that I wasn't clear on, is the intended behavior of
ReplaceVarsFromTargetList(). In the patch, when a join qual is pushed
down, the Vars in the join clause that refer to the subquery's output,
are replaced with the subquery's target list entries. That's the same
thing we do when pushing down regular, non-join quals, too. But when
pushing down a join qual, the qual will also include Vars for other
relations on the same level, not just Vars for the subquery. When the
qual is pushed down to the subquery, the Vars referring to other
relations need to have their varlevelsup incremented. In the patch, I
solved this by first calling IncrementVarSublevelsUp() on the qual, to
bump up varlevelsup for all Vars, and then ReplaceVarsFromTargetList(),
with sublevels_up = 1. But that way, when ReplaceVarsFromTargetList()
replaces Vars with the target list entries, it also adjusts varlevelsup
in any Vars in the target list entry. I changed it to not do that
anymore. All the existing callers call ReplaceVarsFromTargetList() with
sublevels_up == 0, so they shouldn't be affected, but I wonder what the
original intention here was?

Attached is also a second, much more work-in-progress patch, that
expands the push-down support to non-LATERAL subqueries. Yes, that means
that the subquery is planned multiple times. I think that needs some
further heuristics, or perhaps we could pass the list of "potential"
join clauses to the subquery when it's planned for the first time, and
have subquery_planner() tell which of them might be useful, e.g. because
there are indexes to back them.

- Heikki

Attachment Content-Type Size
0001-Push-down-join-quals-into-lateral-subqueries.patch text/x-patch 24.2 KB
0002-WIP-Allow-pushing-join-quals-down-into-subqueries.patch text/x-patch 26.4 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2018-06-06 19:30:40 Re: buildfarm vs code
Previous Message Jeremy Schneider 2018-06-06 18:51:19 Re: Code of Conduct plan