From: | Tomas Vondra <tomas(at)vondra(dot)me> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | PoC: adding CustomJoin, separate from CustomScan |
Date: | 2025-07-24 13:04:17 |
Message-ID: | b8fa8ed4-4444-404f-82f8-8b86c6e82d4d@vondra.me |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
I've been experimenting with executor nodes inspired by papers on
executor robustness (think "Algorithms that don't explode if an estimate
is off."). And I decided to use the CustomScan API, because it seemed
like an ideal solution for my experiments - convenient, isolated, easy
to try on other releases, ...
I'm going to discuss a couple issues with using CustomScan for joins,
and propose some improvements to the CustomScan API to address those.
I'd welcome feedback on the proposed changes / ideas of alternative
approaches, etc. Of course, I may be wrong / missing something about the
CustomScan design, feel free to point that out.
For the "scan" algorithm (SmoothScan [1]), this mostly worked fine. I
ended up copying some (a lot) of the code matching clauses to an index
from indexscan planning, and that's a bit ugly. But the CustomScan API
does not promise to address that, so it did not surprise me, and I
accept that and I can deal with that (at least for now). Other than
that, the CustomScan worked fine for my experimental "scan" method.
But when implementing a custom join (generalized join [2]), it was a
quite different story. The CustomScan claims to support joins, you just
need to use the set_join_pathlist hook, and set a couple fields in the
plan/executor nodes differently. Like, leave scanrelid=0 etc.
And that kinda works for the planning phase, but at execution it turned
out to be much trickier. The main hurdle I ran into is how do you
construct the result tuple?
In regular joins, you can do that by setting
ecxt_innertuple/ecxt_outertuple, and calling ExecProject(). Or something
along those lines. But for CustomScan joins, that's not possible - the
targetlist is modified so that all the Vars have INDEX_VAR, as pointed
out by a comment in primnodes.h:
> In ForeignScan and CustomScan plan nodes, INDEX_VAR is abused to
> signify references to columns of a custom scan tuple type.
Which makes sense, because while the CustomScan can have nested plans,
it does not have a concept of an explicit inner/outer plan.
It seemed to me I'd have to essentially build the tuples "on my own",
which seems quite tricky and inconvenient. And also a bit against the
idea of CustomScan shielding extensions from this kind of "core" stuff.
I may be entirely wrong, of course. Perhaps I'm missing something, and
there's a simple way to do this? I tried to look at existing extensions
implementing joins through CustomScan, but there are not that many, and
I haven't found any good solution.
I also tried reading through the ~2014 threads related to CustomScan,
and how it got modified to allow joins. But I don't see this discussed
there either. It seems to me the CustomScan received the minimum amount
of "tweaks" to allow joins, but it's not very practical.
I realize ForeignScan supports joins in a very similar way (i.e. you
leave scanrelid=0, etc.). But I think there's a difference - the foreign
join code is not really supposed to build the tuples, it gets the
"formed" tuples from somewhere else, more or less. For example
postgres_fdw deparses a query, sends it somewhere, that other node does
the actual join, builds the tuple from inner/outer, and sends it back.
The postgres_fdw code does not need to worry about mapping the target
list to inner/outer etc. This does not work for CustomScan I think
(unless it's doing the same sort of query offloading).
I did ask for suggestions on Discord what's the right way to do this
with CustomScan and joins, and the response was that this may not have
been thought through very carefully, and some improvements may be
necessary to make it more convenient.
So I decided to give this a try. The way I see it, most of the issues
stem from grafting joins onto an interface that's designed for scans.
The whole sequence of custom nodes:
CustomPath -> CustomScan -> CustomScanState
is based on scans. CustomPath "inherits" from Path, CustomScan from
"Scan", CustomScanState from "ScanState". It's not clear to me how to
make this work with "JoinPath", "JoinPlan" and "JoinState" in a
reasonable way.
I suppose some of the "join" data can be stashed in the private fields,
of the structs, but then the various planner/executor parts need to know
about that in some way. How else would setrefs do the right thing with
translating the targetlist into inner/outer references? (Maybe it could
be done in PlanCustomPath, but it seems too early?)
The "proper" way seems to be to have separate nodes for joins:
CustomJoinPath -> CustomJoin -> CustomJoinState
The attached PoC patch does that (except for the CustomJoinPath, it can
do with CustomPath for now). It's more or less a copy-paste adjusting
all the places modifying all the places with "case CustomScan" to also
deal with "CustomJoin" - either in the same way, or sometimes in a way
that works for joins. It's fairly mechanical.
With this patch, my custom join can simply do
econtext->ecxt_outertuple = outer;
econtext->ecxt_innertuple = inner;
return ExecProject(node->js.ps.ps_ProjInfo);
and it works.
One thing that surprised me a bit is that there's no testing extension
implementing a simple custom scan/join. So it's hard to show this :-(
I'm sure there's places that need more work (some of which are marked
with FIXME, and I probably missed some). But it surprised me how small
the patch is - most of it is the mechanical adjustments of switches. It
would get a bit larger, e.g. due to sgml docs (which the PoC patch does
not update).
The patch also renames a couple structs to have "Scan" in them, e.g.
CustomExecMethods are now CustomScanExecMethods. This is necessary
because the methods get the "plan state" of the particular type (i.e.
CustomScanState vs. CustomJoinState), etc. I guess we could do with some
"shared" state, but it seems like a recipe for confusion, and I'm not
sure it'd remove that much "code" anyway.
So that's what I have for now.
Note: I mentioned some extensions implementing SmoothScan/G-join. I plan
to publish those once I polish that a bit more. It's more a research
rather than something ready to use right now.
regards
[1] https://scholar.harvard.edu/files/stratos/files/smooth_vldbj.pdf
[2]
https://dl.gi.de/server/api/core/bitstreams/ce8e3fab-0bac-45fc-a6d4-66edaa52d574/content
--
Tomas Vondra
Attachment | Content-Type | Size |
---|---|---|
v1-0001-PoC-introduce-CustomJoin-as-separate-from-CustomS.patch | text/x-patch | 36.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | jian he | 2025-07-24 13:44:22 | Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions |
Previous Message | Andrei Lepikhov | 2025-07-24 12:25:38 | Re: track generic and custom plans in pg_stat_statements |