Re: BUG #15781: subselect on foreign table (postgres_fdw) can crash (segfault)

From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: sean(dot)johnston(at)edgeintelligence(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org, pgsql-hackers(at)lists(dot)postgresql(dot)org, etsuro(dot)fujita(at)gmail(dot)com
Subject: Re: BUG #15781: subselect on foreign table (postgres_fdw) can crash (segfault)
Date: 2019-04-26 12:39:46
Message-ID: 5CC2FC12.6050409@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

(2019/04/26 3:24), Tom Lane wrote:
> PG Bug reporting form<noreply(at)postgresql(dot)org> writes:
>> [ this crashes if ft4 is a postgres_fdw foreign table: ]
>> select exists(select c1 from ft4), avg(c1) from ft4 where c1 = (select
>> max(c1) from ft4);
>
> Hm, the max() subquery isn't necessary, this is sufficient:
>
> select exists(select c1 from ft4), avg(c1) from ft4 where c1 = 42;
>
> That produces a plan like
>
> QUERY PLAN
> -----------------------------------------------------------------------------------
> Foreign Scan (cost=200.07..246.67 rows=1 width=33)
> Output: ($0), (avg(ft4.c1))
> Relations: Aggregate on (public.ft4)
> Remote SQL: SELECT $1::boolean, avg(c1) FROM "S 1"."T 3" WHERE ((c1 = 432))
> InitPlan 1 (returns $0)
> -> Foreign Scan on public.ft4 ft4_1 (cost=100.00..212.39 rows=3413 width=0)
> Remote SQL: SELECT NULL FROM "S 1"."T 3"
> (7 rows)
>
> Now one's first observation about that is that it's kind of dumb to send
> the result of the locally-executed InitPlan over to the far end only to
> read it back. So maybe we should be thinking about how to avoid that.
> We do avoid it for plain foreign scans:
>
> regression=# explain verbose
> select exists(select c1 from ft4), * from ft4 where c1 = 42;
> QUERY PLAN
> -----------------------------------------------------------------------------------
> Foreign Scan on public.ft4 (cost=200.03..226.15 rows=6 width=41)
> Output: $0, ft4.c1, ft4.c2, ft4.c3
> Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3" WHERE ((c1 = 42))
> InitPlan 1 (returns $0)
> -> Foreign Scan on public.ft4 ft4_1 (cost=100.00..212.39 rows=3413 width=0)
> Remote SQL: SELECT NULL FROM "S 1"."T 3"
> (6 rows)
>
> and also for foreign joins:
>
> regression=# explain verbose
> select exists(select c1 from ft4), * from ft4, ft4 ft4b where ft4.c1 = 42 and ft4b.c1 = 43;
> QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> Foreign Scan (cost=200.03..252.93 rows=36 width=81)
> Output: $0, ft4.c1, ft4.c2, ft4.c3, ft4b.c1, ft4b.c2, ft4b.c3
> Relations: (public.ft4) INNER JOIN (public.ft4 ft4b)
> Remote SQL: SELECT r1.c1, r1.c2, r1.c3, r2.c1, r2.c2, r2.c3 FROM ("S 1"."T 3" r1 INNER JOIN "S 1"."T 3" r2 ON (((r2.c1 = 43)) AND ((r1.c1 = 42))))
> InitPlan 1 (returns $0)
> -> Foreign Scan on public.ft4 ft4_1 (cost=100.00..212.39 rows=3413 width=0)
> Remote SQL: SELECT NULL FROM "S 1"."T 3"
> (7 rows)
>
>
> but the code for upper-relation scans is apparently stupider than either
> of those cases.
>
> The proximate cause of the crash is that we have {PARAM 1}
> (representing the output of the InitPlan) in the path's fdw_exprs, and
> also the identical expression in fdw_scan_tlist, and that means that when
> setrefs.c processes the ForeignScan node it thinks it should replace the
> {PARAM 1} in fdw_exprs with a Var representing a reference to the
> fdw_scan_tlist entry. That would be fine if the fdw_exprs represented
> expressions to be evaluated over the output of the foreign scan, but of
> course they don't --- postgres_fdw uses fdw_exprs to compute values to be
> sent to the remote end, instead. So we crash at runtime because there's
> no slot to supply such output to the fdw_exprs.
>
> I was able to make the crash go away by removing this statement from
> set_foreignscan_references:
>
> fscan->fdw_exprs = (List *)
> fix_upper_expr(root,
> (Node *) fscan->fdw_exprs,
> itlist,
> INDEX_VAR,
> rtoffset);
>
> and we still pass check-world without that (which means we lack test
> coverage, because the minimum that should happen to fdw_exprs is
> fix_scan_list :-(). But I do not think that's an acceptable route to
> a patch, because it amounts to having the core code know what the FDW
> is using fdw_exprs for, and we explicitly disclaim any assumptions about
> that. fdw_exprs is specified to be processed the same as other
> expressions in the same plan node, so I think this fix_upper_expr call
> probably ought to stay like it is, even though it's not really the right
> thing for postgres_fdw. It might be the right thing for other FDWs.
>
> (One could imagine, perhaps, having some flag in the ForeignPlan
> node that tells setrefs.c what to do. But that would be an API break
> for FDWs, so it wouldn't be a back-patchable solution.)
>
> (Actually, it seems to me that set_foreignscan_references is *already*
> assuming too much about the semantics of these expressions in upper
> plan nodes, so maybe we need to have a chat about that anyway.)
>
> If we do leave it like this, then the only way for postgres_fdw to
> avoid trouble is to not have any entries in fdw_exprs that exactly
> match entries in fdw_scan_tlist. So that pretty much devolves back
> to what I said before: don't ship values to the far end that are
> just going to be fed back as-is. But now it's a correctness
> requirement not just an optimization.

Thanks for taking care of this, as usual!

> I haven't had anything to do with postgres_fdw's upper-relation-pushdown
> code, so I am not sure why it's stupider than the other cases.
> Thoughts anybody?

I worked on the ORDERED/FINAL-upperrel pushdown for PG12, but I don't
think that that's directly related to this issue, because this arises in
PG11 already. Maybe I'm missing something, but the
UPPERREL_GROUP_AGG-upperrel pushdown added in PG10 is likely to be
related to this. I'll work on this issue unless somebody wants to. But
I'll take a 10-day vocation from tomorrow, so I don't think I'll be able
to fix this in the next minor release...

Best regards,
Etsuro Fujita

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message neelaveni 2019-04-26 12:44:13 Re: Reg: Postgresql8.3 Using on Ubuntu
Previous Message Flo Rance 2019-04-26 12:21:31 Re: Reg: Postgresql8.3 Using on Ubuntu

Browse pgsql-hackers by date

  From Date Subject
Next Message Rafia Sabih 2019-04-26 12:49:46 Re: [PATCH v1] Show whether tables are logged in \dt+
Previous Message Peter Eisentraut 2019-04-26 12:32:15 Re: "Routine Reindexing" docs should be updated to reference REINDEX CONCURRENTLY