Bogus EPQ plan construction in postgres_fdw

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Bogus EPQ plan construction in postgres_fdw
Date: 2018-12-12 20:00:03
Message-ID: 8946.1544644803@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

By chance I noticed that postgres_fdw's postgresGetForeignPlan() assumes
--- without any checking --- that the outer_plan it's given for a join
relation must have a NestLoop, MergeJoin, or HashJoin node at the top.
That's been wrong at least since commit 4bbf6edfb (which could cause
insertion of a Sort node on top) and it seems like a pretty unsafe
thing to Just Assume even without that.

There are two ways in which this faulty assumption is materialized.
One is that a new targetlist is jammed into the plan node without
any check as to whether it's really safe to do that, ie whether the
node is projection-capable or not. Of course, a Sort node is *not*
projection-capable, meaning that we get a broken plan tree that will
not deliver the set of columns that postgres_fdw wanted.

This would be a security problem, similar to other recent problems where
the output datatypes of a plan node are misidentified, except that through
blind good fortune it seems impossible to reach the problem at execution.
The jammed-in targetlist is always identical to the subplan's original
targetlist (i.e., it's the Vars-only targetlist the planner constructed
for the joinrel) except at the topmost join level --- and if this is
happening at the topmost level, then the foreign join must encompass all
relations in the query. That means that an EPQ recheck will never happen
(since EPQ could fire only if there's at least one non-postgres_fdw
relation involved), so the broken plan tree will never be executed.

There remains a cosmetic problem, which is that since the Sort node's
tlist doesn't match reality, EXPLAIN sometimes gets confused and prints
garbage info about the Sort's sortkeys. This is in fact visible in
current regression test results:

-> Sort
Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
Sort Key: t1.c3 USING <, t1.c1

The "USING <" bit should only appear if the sort key is using a nondefault
sort operator, which surely isn't the case here, so why's that there?
The reason is that EXPLAIN is looking at the bogus tlist and drawing the
wrong conclusion about the sort key's datatype, which naturally leads it
to draw the wrong conclusion about whether the operator is the default.

The second way in which postgresGetForeignPlan() is broken is that it
does this without any check on the plan node's actual type:

if (join_plan->jointype == JOIN_INNER)
join_plan->joinqual = list_delete(join_plan->joinqual,
qual);

This would certainly lead to a crash if list_delete were invoked on
some bits that weren't a pointer-to-List. In the case of a Sort node,
we're again escaping bad consequences through blind good fortune:
Join.jointype is at the same struct offset as Sort.numCols, which will
never be zero. Hence, if JoinType is the same size as int, or if it's
smaller but the machine is little-endian, jointype will not read as
JOIN_INNER (zero) so nothing bad happens. On a big-endian machine where
enum types can be smaller than int, this'd likely crash in list_delete.
The fact that none of our big-endian buildfarm critters have failed on
the regression tests suggests that that combination may not exist in
reality. (We'd also have a problem if the plan node was neither a join
nor a Sort, but I'm unsure whether such a case is actually reachable.)

Hence, I'm planning to apply the attached, which removes both of the
unwarranted assumptions. In the qual-deletion step, we can just not
try to delete quals if it's not recognizably a join node; that's only
a minor optimization that we can skip. The other problem can be fixed
by checking whether the node is projection-capable and inserting a
Result node if it isn't. createplan.c has logic for exactly that,
but it wasn't exposed in a usable form, so I modified createplan.c
to provide a function for this.

regards, tom lane

Attachment Content-Type Size
fix-bogus-epq-plan-construction-1.patch text/x-diff 21.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message PG Bug reporting form 2018-12-12 20:00:45 BUG #15548: Unaccent does not remove combining diacritical characters
Previous Message Tomas Vondra 2018-12-12 18:28:29 Re: COPY FROM WHEN condition