From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Cc: | kristianlejao(at)gmail(dot)com |
Subject: | TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c |
Date: | 2025-08-05 18:00:54 |
Message-ID: | CAD21AoBpo6Gx55FBOW+9s5X=nUw3Xpq64v35fpDEKsTERnc4TQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hi all,
Kristian Lejao (colleague, in CC) has found the following assertion
failure in postgres_fdw.c when rechecking the result tuple via
EvalPlanQual():
TRAP: failed Assert("outerPlan != NULL"), File: "postgres_fdw.c",
Line: 2366, PID: 2043518
Here is the reproducible steps that I've simplified from the one
Kristian originally created:
1. setup local node:
create extension postgres_fdw;
create server srv foreign data wrapper postgres_fdw options (host
'localhost', port '5433', dbname 'postgres');
create user mapping for public server srv;
create table a (i int primary key);
create foreign table b (i int) server srv;
create foreign table c (i int) server srv;
insert into a values (1);
2. setup remote node:
create table b (i int);
create table c (i int);
insert into b values (1);
insert into c values (1);
3. attach to the backend process started on the local node (say conn1)
using gdb and set breakpoint at table_tuple_lock().
4. run the following query on conn1 (which stops before locking the
result tuple):
select a.i,
(select 1 from b, c where a.i = b.i and b.i = c.i)
from a
for update;
5. on another session, update the tuple concurrently:
update a set i = i + 1; -- update 1 tuple
6. continue the query on conn1, the server crashes due to the assertion failure.
The plan of the FOR UPDATE query lead to this issue is:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
LockRows (cost=0.00..615886.00 rows=2550 width=14)
Output: a.i, ((SubPlan 1)), a.ctid
-> Seq Scan on public.a (cost=0.00..615860.50 rows=2550 width=14)
Output: a.i, (SubPlan 1), a.ctid
SubPlan 1
-> Foreign Scan (cost=100.00..241.50 rows=225 width=4)
Output: 1
Relations: (public.b) INNER JOIN (public.c)
Remote SQL: SELECT NULL FROM (public.b r1 INNER JOIN
public.c r2 ON (((r2.i = $1::integer)) AND ((r1.i = $1::integer))))
(9 rows)
The point is that in the subquery in the target list we pushed the
inner join to the foreign server. In postgresGetForeignJoinPaths(), we
prepare the join path for EvalPlanQual() check (and used in
postgresRecheckForeignScan()) if the query is DELETE, UPDATE, or FOR
UPDATE/SHARE (as shown below) but we skip it since the subquery itself
is parsed as a normal SELECT query without rowMarks, leaving
fdw_outerpath of the ForeignScan node NULL:
/*
* If there is a possibility that EvalPlanQual will be executed, we need
* to be able to reconstruct the row using scans of the base relations.
* GetExistingLocalJoinPath will find a suitable path for this purpose in
* the path list of the joinrel, if one exists. We must be careful to
* call it before adding any ForeignPath, since the ForeignPath might
* dominate the only suitable local path available. We also do it before
* calling foreign_join_ok(), since that function updates fpinfo and marks
* it as pushable if the join is found to be pushable.
*/
if (root->parse->commandType == CMD_DELETE ||
root->parse->commandType == CMD_UPDATE ||
root->rowMarks)
{
epq_path = GetExistingLocalJoinPath(joinrel);
Therefore, if the tuple is concurrently updated before taking a lock,
we recheck the traversed tuple via EvalPlanQual() but we end up with
the assertion failure since we didn't prepare the join plan for that.
The attached patch includes the draft fix and regression tests (using
injection points).
I don't have enough experience with the planner and FDW code area to
evaluate whether the patch fixes the issue in the right approach.
Feedback is very welcome. I've confirmed this assertion could happen
with the same scenario on all supported branches.
In addition to that, I realized that none of the regression tests
execute postgresRecheckForeignScan()[1]. I think we need to add
regression tests to cover that function.
Regards,
[1] https://coverage.postgresql.org/contrib/postgres_fdw/postgres_fdw.c.gcov.html#2354()
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Attachment | Content-Type | Size |
---|---|---|
0001-Fix-assertion-failure-in-postgresGetForeignJoinPaths.patch | application/x-patch | 5.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | PG Bug reporting form | 2025-08-05 21:26:56 | BUG #19013: When creating a table with the "...LIKE...INCLUDING ALL" construct, REPLICA IDENTITY output is wrong |
Previous Message | Jeff Davis | 2025-08-05 16:28:24 | Re: CREATE DATABASE copies datlocale even if datlocprovider differs |