| From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
|---|---|
| To: | solaimurugan vellaipandiyan <drsolaimurugan(dot)v(at)gmail(dot)com> |
| Cc: | Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, g(dot)kashkin(at)postgrespro(dot)ru, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Function scan FDW pushdown |
| Date: | 2026-05-11 12:02:24 |
| Message-ID: | CAPpHfduq=DZjuqysuvu7qqneYcr0FUn-auE4yJwPbM1GLGqR9A@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Mon, May 11, 2026 at 9:01 AM solaimurugan vellaipandiyan <
drsolaimurugan(dot)v(at)gmail(dot)com> wrote:
> Thanks for the detailed explanation and the example queries. That
> helped me better understand the costing behavior behind the pushdown
> decisions.
>
> I tested the new v4 patch on current master using a postgres_fdw
> loopback setup with local table t1 and foreign table ft1.
>
> Here’s what I observed:
> - Before ANALYZE, the planner chose a local Hash Join with separate
> Foreign Scan and Function Scan nodes.
> - After running ANALYZE on the empty table, it still preferred local
> execution, which makes sense based on the costing explanation.
> - I then inserted 1000 rows into t1, ran ANALYZE again, and repeated the
test.
>
> Even after that, I still got a local Hash Join plan like:
> Hash Join
> - Foreign Scan on ft1
> - Function Scan on generate_series
> I wasn’t able to observe the pushed-down Foreign Scan plan shown in
> the example from the thread.
>
> The patch itself applied and built successfully on my side, so this
> may just be due to planner cost differences or environment-specific
> behavior on current master.
This also comes from the cost model. Check this example.
# CREATE TABLE t1 (id int);
# INSERT INTO t1 SELECT g FROM generate_series(1, 1000) g;
# CREATE FOREIGN TABLE ft1 (id int) SERVER loopback OPTIONS (table_name
't1');
# ANALYZE t1;
# ANALYZE ft1;
By default the local join is selected.
# EXPLAIN (VERBOSE, COSTS ON)
SELECT * FROM ft1, generate_series(1, 100) AS g(id)
WHERE ft1.id = g.id;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Hash Join (cost=102.25..332.00 rows=100 width=8)
Output: ft1.id, g.id
Hash Cond: (ft1.id = g.id)
-> Foreign Scan on public.ft1 (cost=100.00..325.00 rows=1000 width=4)
Output: ft1.id
Remote SQL: SELECT id FROM public.t1
-> Hash (cost=1.00..1.00 rows=100 width=4)
Output: g.id
-> Function Scan on pg_catalog.generate_series g
(cost=0.00..1.00 rows=100 width=4)
Output: g.id
Function Call: generate_series(1, 100)
(11 rows)
However, we can force remote join using enable_* options. You can see it
has higher cost. This is because estimate_path_cost_size() expects join
operator to be applied to the whole cross-product.
# SET enable_hashjoin = off;
# SET enable_mergejoin = off;
# SET enable_nestloop = off;
# EXPLAIN (VERBOSE, COSTS ON)
SELECT * FROM ft1, generate_series(1, 100) AS g(id)
WHERE ft1.id = g.id;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Hash Join (cost=102.25..332.00 rows=100 width=8)
Output: ft1.id, g.id
Hash Cond: (ft1.id = g.id)
-> Foreign Scan on public.ft1 (cost=100.00..325.00 rows=1000 width=4)
Output: ft1.id
Remote SQL: SELECT id FROM public.t1
-> Hash (cost=1.00..1.00 rows=100 width=4)
Output: g.id
-> Function Scan on pg_catalog.generate_series g
(cost=0.00..1.00 rows=100 width=4)
Output: g.id
Function Call: generate_series(1, 100)
(11 rows)
# RESET enable_hashjoin;
# RESET enable_mergejoin;
# RESET enable_nestloop;
Also, this can be fixed using remote estimate. You also can check that v3
regression tests by Pyhalov use this approach as well.
# ALTER FOREIGN TABLE ft1 OPTIONS (ADD use_remote_estimate 'true');
# EXPLAIN (VERBOSE, COSTS ON)
SELECT * FROM ft1, generate_series(1, 100) AS g(id)
WHERE ft1.id = g.id;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Foreign Scan (cost=102.25..143.00 rows=100 width=8)
Output: ft1.id, g.id
Relations: (public.ft1) INNER JOIN (Function g)
Remote SQL: SELECT r1.id, f2.c1 FROM (public.t1 r1 INNER JOIN
generate_series(1, 100) f2(c1) ON (((r1.id = f2.c1))))
(4 rows)
Thus, I don't see it to be a problem of this specific patch. I think this
is general inaccuracy of postgres_fdw cost model.
------
Regards,
Alexander Korotkov
Supabase
| From | Date | Subject | |
|---|---|---|---|
| Next Message | jian he | 2026-05-11 12:03:17 | Re: FOR PORTION OF does not recompute GENERATED STORED columns that depend on the range column |
| Previous Message | Tomas Vondra | 2026-05-11 11:44:13 | Re: Parallel INSERT SELECT take 2 |