Re: BUG #15990: PROCEDURE throws "SQL Error [XX000]: ERROR: no known snapshots" with PostGIS geometries

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, a(dot)wicht(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15990: PROCEDURE throws "SQL Error [XX000]: ERROR: no known snapshots" with PostGIS geometries
Date: 2021-05-18 19:45:04
Message-ID: 3362608.1621367104@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I wrote:
> Yeah, on further thought, the real question to be asking here is
> "what's protecting that in-flight datum from becoming a dangling
> pointer?". AFAICT, the answer right now is "nothing". Therefore,
> it is *never* okay for plpgsql to be executing without a registered
> transaction snapshot; and it seems quite unlikely that it'd be any
> safer for any other PL.

Here's a draft patch for this issue.

I concluded that the reason things work for a normal DO block
or plpgsql procedure (without internal transaction management)
is that pquery.c -- in particular, PortalRunUtility -- establishes
a snapshot for the duration of execution of the portal that is
running the CALL or DO. That allows us to detoast the results of
a SPI query even though SPI has released whatever statement-level
snapshot it was using. Hence, what we should do to fix this is
to try to restore that state of affairs after an internal commit.
We can't just make a snapshot instantly, because it may be that
the user wants to issue SET command(s) to change the transaction
mode and/or LOCK commands to acquire locks before a new
transaction snapshot is taken. This means that _SPI_execute_plan
needs to deal with re-creating the Portal-level snapshot, but not
till just before executing a command that has to have a snapshot.

(Of course, fixing it in _SPI_execute_plan only fixes things for
PLs that go through SPI. But if you don't, you were pretty much
on your own before, and you still are.)

So the attached patch removes PortalRunUtility's local state in
favor of adding a Portal field that tracks the snapshot provided
by the Portal, if any. Then when we make a new snapshot that
logically substitutes for the portal's original snapshot, we can
store it in that field and it'll be released when we finally
exit the portal.

There's a loose end that this doesn't really resolve, which is
that a procedure could return output data that involves a toast
pointer. If the procedure's last action was COMMIT, we'll
come back to ExecuteCallStmt with no snapshot that protects the
validity of that toast pointer. For the moment I just added a
comment that says PLs mustn't do that. I think that is safe
for plpgsql, because it treats output parameters like local
variables, and it already knows it mustn't store toast pointers
in procedure local variables. It's probably all right for our
other PLs too, because they don't really deal in direct use of
toasted datums anyway. But it still seems scary, because there is
really no good way to detect a violation.

A couple of other bits of unfinished business:

* This basically moves all of the snapshot management logic into
_SPI_execute_plan, making "no_snapshots" quite a misnomer.
It kind of was already, because of its contribution to whether
we pass down PROCESS_UTILITY_QUERY_NONATOMIC, but now it's a
flat out lie. We still need a flag to control what happens with
PROCESS_UTILITY_QUERY_NONATOMIC, so I renamed it to
allow_nonatomic. For the moment I just did that renaming locally
in _SPI_execute_plan. In HEAD, we ought to propagate that name
into SPIExecuteOptions, but I haven't done that here.

* I think we could remove the PLPGSQL_STMT_SET infrastructure
and go back to treating SET/RESET the same as other utility
commands so far as plpgsql is concerned. I haven't done that
here either, because it wouldn't be appropriate to do in the
back branches (if only for fear of breaking pldebugger).

Note that we still need the other patch to discourage plpgsql
from leaving not-yet-detoasted datums hanging around in a
FOR loop's fetch queue.

regards, tom lane

Attachment Content-Type Size
restore-portal-snapshot-after-commit-1.patch text/x-diff 25.1 KB

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message James Coleman 2021-05-18 20:36:02 Less selective index chosen unexpectedly
Previous Message Pavel Stehule 2021-05-18 16:54:51 Re: Query with straightforward plan changes and becomes 6520 times slower after VACUUM ANALYZE