| From: | Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru> |
|---|---|
| To: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
| Cc: | solaimurugan vellaipandiyan <drsolaimurugan(dot)v(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, g(dot)kashkin(at)postgrespro(dot)ru, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Function scan FDW pushdown |
| Date: | 2026-05-12 08:09:23 |
| Message-ID: | 7e2abeb3ea9bc7ca024f4e457dce33f5@postgrespro.ru |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Alexander Korotkov писал(а) 2026-05-11 01:22:
> Hi!
>
> On Fri, May 8, 2026 at 9:03 AM Alexander Pyhalov
> <a(dot)pyhalov(at)postgrespro(dot)ru> wrote:
>>
>> solaimurugan vellaipandiyan писал(а) 2026-05-08 08:17:
>> > Hi,
>> >
>> > I tested the v3 patch on current master using a postgres_fdw loopback
>> > setup.
>> >
>> > The patch mostly applied cleanly for me, with only a small manual
>> > conflict resolution needed in postgres_fdw.c (missing
>> > optimizer/clauses.h include). PostgreSQL built and started
>> > successfully after that.
>> >
>> > For testing, I used queries involving generate_series() together with
>> > a foreign table, for example:
>> >
>> > SELECT *
>> > FROM ft
>> > JOIN generate_series(1,3) g
>> > ON ft.id = g;
>> >
>> > I also tried:
>> > 1. implicit join syntax
>> > 2. LATERAL usage
>> > 3. disabling hashjoin/mergejoin to force different join paths
>> >
>> > In all cases, the plans still showed a local Function Scan and local
>> > join execution. The Remote SQL remained:
>> > SELECT id FROM public.localtab
>> > So I wasn't able to observe function scan pushdown with these
>> > testcases.
>> > Maybe I'm missing a query shape or planner condition required to
>> > trigger the new functionality. It would help to know which cases are
>> > currently expected to be pushed down.
>> >
>> > Regards,
>> > Solaimurugan V
>>
>>
>> Hi. Of course, this depends on foreign scan cost. Foreign join should
>> be
>> cheaper than local one.
>> The cost of foreign join consists from
>> 1) cost of evaluating inner and outer part of the join,
>> 2) total costs of retreiving rows (nrows * cost of qual evaluation),
>> 3) cost of preparing to evaluate quals (remote and local),
>> 4) fdw_startup_cost
>> 5) (fdw_tuple_cost + cpu_tuple_cost) * retrieved_rows.
>> Look at estimate_path_cost_size().
>>
>> The cost of retreiving foreign rows consists from
>> 1) cost of estimating baserestrictinfos
>> 2) cost of remote sequential scan
>> 3) cost of evaluating foreign relation target list
>> 4) fdw_startup_cost
>> 5) (fdw_tuple_cost + cpu_tuple_cost) * retrieved_rows.
>>
>> If we compare function scan pushdown for foreign table and function
>> with
>> foreign join pushdown of two foreign tables, we don't have two
>> penalties from fdw_startup_cost here. So, for function scan pushdown
>> to
>> be chosen, it should filter out significant part of data, as data
>> transfer cost likely dominates in foreign scan costs estimation. You
>> can
>> see this on the following toy example.
>>
>>
>> CREATE EXTENSION postgres_fdw;
>>
>> SELECT current_database() AS current_database,
>> current_setting('port') AS current_port \gset
>> CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw
>> OPTIONS (dbname :'current_database', port :'current_port');
>> CREATE USER MAPPING FOR current_user SERVER loopback;
>> CREATE table t1 (i int, j text);
>> create foreign table ft1 (i int, j text) server loopback options
>> (table_name 't1');
>>
>> -- No adequate data about relation, we assume 1000+ rows in table, so
>> foreign join is chosen.
>>
>> # explain select * from ft1 , generate_series(1,10) g where i=g;
>> QUERY PLAN
>> ------------------------------------------------------
>> Foreign Scan (cost=100.00..172.16 rows=68 width=40)
>> Relations: (ft1) INNER JOIN (generate_series() g)
>> (2 rows)
>>
>> -- We have actual data, there's no use to push down foreign join
>> postgres=# analyze ft1;
>> ANALYZE
>> postgres=# explain select * from ft1 , generate_series(1,10) g where
>> i=g;
>> QUERY PLAN
>> -----------------------------------------------------------------------------
>> Hash Join (cost=100.02..100.16 rows=1 width=40)
>> Hash Cond: (g.g = ft1.i)
>> -> Function Scan on generate_series g (cost=0.00..0.10 rows=10
>> width=4)
>> -> Hash (cost=100.00..100.00 rows=1 width=36)
>> -> Foreign Scan on ft1 (cost=100.00..100.00 rows=1
>> width=36)
>>
>> -- If we had two foreign scans below join here, its startup cost would
>> be more than 2*100.
>> -- But evaluating function locally saves us from such great penalty.
>> -- Let's add some actual data.
>>
>> postgres=# insert into ft1 select i, 'test'||i from
>> generate_series(1,1000) i;
>> INSERT 0 1000
>> postgres=# analyze ft1;
>> ANALYZE
>>
>> -- Now we have actual data, transfering 1000 rows (plain foreign scan
>> costs ~ 410) and joining locally is more expensive than doing
>> -- remote join and transfering 10 rows.
>>
>> postgres=# explain select * from ft1 , generate_series(1,10) g where
>> i=g;
>> QUERY PLAN
>> ------------------------------------------------------
>> Foreign Scan (cost=100.00..143.20 rows=10 width=15)
>> Relations: (ft1) INNER JOIN (generate_series() g)
>> (2 rows)
>
> Thank you for your work on improving this matter. I've updated the
> patch attempting to address major concerns raised in this thread.
>
> 1) Limit the support of the pushdown with immutable functions. This
> must eliminate the risk of different evaluation results on different
> servers (as long as immutable is set correctly).
> 2) Store pointer to stub fpinfo for the function in the absorbing
> foreign table. Thus, if there are multiple foreign tables, which
> could absorb the function, both ways can be considered. This is
> illustrated in the regression tests: the decision on which foreign
> table would absorb the function can be changed depending on clause
> selectivity.
>
> ------
> Regards,
> Alexander Korotkov
> Supabase
Hi.
I've tested updated patch a bit.
1) deparseColumnRef() doesn't account for whole row vars.
In queries like
UPDATE remote_tbl r SET b=5 FROM UNNEST(array[box '((2,3),(-2,-3))']) AS
t (bx) WHERE r.a = area(t.bx)
it fails with assert that varattno should be > 0. When we lock
non-relation RTE, we select whole row var, and we have to deparse it for
function RTE.
You've removed check for function return type. This seems to be
dangerous. Old example
CREATE OR REPLACE FUNCTION f_ret_record() RETURNS record AS $$ SELECT
(1,2)::record $$ language SQL IMMUTABLE;
ALTER EXTENSION postgres_fdw ADD function f_ret_record();
EXPLAIN (VERBOSE, COSTS OFF)
SELECT s FROM remote_tbl rt, f_ret_record() AS s(a int, b int)
WHERE s.a = rt.a;
fails with
ERROR: a column definition list is required for functions returning
"record"
2) postgresBeginForeignScan() can step on function RTE, and doesn't know
what to do with it:
SELECT * FROM unnest(array[2,3,4]) n, remote_tbl r WHERE r.a = n;
ERROR: cache lookup failed for foreign table 0
So, we need to look for the first RTE_RELATION, as in older patch
version.
3) A lot of complexity in the old patch version was in making it
possible to find out RTE_FUNCTION attribute types after planing, as it's
necessary to correctly handle joins. In this version
get_tupdesc_for_join_scan_tuples() doesn't handle function RTEs. This
means, when we try to find out type for attribute types for joins, we'll
get errors. This can be seen in queries like
UPDATE remote_tbl r SET b=CASE WHEN random()>=0 THEN 5 ELSE 0 END FROM
UNNEST(array[box '((2,3),(-2,-3))']) AS t (bx) WHERE r.a = area(t.bx)
RETURNING a,b;
Now it fails on earlier stages (with "column f2.c0 does not exist"), but
if we fix it, we'll get something like
"ERROR: input of anonymous composite types is not implemented"
Overall, function_rte_pushdown_ok() now allows more strange
constructions. Could it skip Vars from outside of joinrel->relids? Can
we safely ship function with parameters in arguments? I'm not sure.
4) Why do we restrict list_length(rte->functions) to 1? The main reason
for supporting several rte->functions was to allow pushdown of UNNEST()
with several arrays, which is used by HammerDB tproc-c test.
5) We can support pushing down joins of foreign tables and another RTE
types, for example, VALUES. But with new specific way of handling
RTE_FUNCTIONS in core, this requires both changes to
set_foreign_rel_properties() and postgres_fdw. Not sure, if this is a
big problem, but at least is worth mentioning .
--
Best regards,
Alexander Pyhalov,
Postgres Professional
| From | Date | Subject | |
|---|---|---|---|
| Next Message | jian he | 2026-05-12 08:15:19 | Re: COPY ON_CONFLICT TABLE; save duplicated record to another table. |
| Previous Message | Ayush Tiwari | 2026-05-12 08:02:44 | Re: [PATCH] Fix column name escaping in postgres_fdw stats import |