Re: fix crash with Python 3.11

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: fix crash with Python 3.11
Date: 2022-02-07 22:38:01
Message-ID: 1103370.1644273481@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com> writes:
> I've been struggling to make progress on this. First, throwing the
> Python exception suggested in #1 above would require a significant
> amount of new code. (We have code to create an exception out of
> ErrorData, but no code to make one up from nothing.) This would need
> further thought on how to arrange the code sensibly. Second, calling
> AbortOutOfAnyTransaction() appears to clean up so much stuff that even
> the data needed for error handling in PL/Python is wiped out.

Yeah, it's messy.

> One way to way to address the first problem is to not handle the "cannot
> commit while a subtransaction is active" and similar cases manually but
> let _SPI_commit() throw the error and then check in PL/Python for the
> error code ERRCODE_INVALID_TRANSACTION_TERMINATION. (The code in
> _SPI_commit() is there after all explicitly for PLs, so if we're not
> using it, then we're doing it wrong.)

TBH, I've thought from day one that _SPI_commit and friends are unusably
simplistic, precisely because they throw all this error-recovery
complexity onto the caller, and provide no tools to handle it without
dropping down to a lower level of abstraction.

> But in any case, for either implementation it seems then you'd get
> different error behavior from .commit and .rollback calls depending on
> the specific error, which seems weird.

Well, I think we *have to* do that. The INVALID_TRANSACTION_TERMINATION
errors mean that we're in some kind of atomic execution context that
we mustn't destroy. Other errors mean that the commit failed, and that
has to be cleaned up somehow.

(Note: there is also an INVALID_TRANSACTION_TERMINATION error in
HoldPinnedPortals, which is now seeming like a mistake to me; we should
change that to some other errcode, perhaps. There are no others
besides _SPI_commit/_SPI_rollback.)

> Altogether, maybe the response to

>>> The existing test cases apparently fail to trip over that
>>> because Python just throws the exception again, but what if someone
>>> writes a plpython function that catches the exception and then tries
>>> to perform new SPI actions?

> perhaps should be, don't do that then?

That seems far south of acceptable to me. I experimented with the
attached script, which in HEAD just results in aborting the Python
script -- not good, but at least the general state of the session
is okay. With your patch, I get

INFO: sqlstate: 23503
INFO: after exception
WARNING: buffer refcount leak: [8760] (rel=base/16384/38401, blockNum=0, flags=0x93800000, refcount=1 1)
WARNING: relcache reference leak: relation "testfk" not closed
WARNING: relcache reference leak: relation "testpk" not closed
WARNING: TupleDesc reference leak: TupleDesc 0x7f3a12e0de80 (38403,-1) still referenced
WARNING: snapshot 0x1cba150 still active
DO
f1
----
0
1
(2 rows)

So aside from all the internal problems, we've actually managed to commit
an invalid database state. I do not find that OK.

I think that maybe we could get somewhere by having _SPI_commit/rollback
work like this:

1. Throw the INVALID_TRANSACTION_TERMINATION errors if appropriate.
The caller can catch those and proceed if desired, knowing that the
transaction system is in the same state it was.

2. Use a PG_TRY block to do the commit or rollback. On error,
roll back (knowing that we are not in a subtransaction) and then
re-throw the error.

If the caller catches an error other than INVALID_TRANSACTION_TERMINATION,
it can proceed, but it's still on the hook to do SPI_start_transaction
before it attempts any new database access (just as it would be if
there had not been an error).

We could provide a simplified API in which SPI_start_transaction is
automatically included for either success or failure, so that the caller
doesn't have the delayed-cleanup responsibility. I'm not actually sure
that there is any situation where that couldn't be done, but putting it
into the existing functions would be a API break (... unless we turn
SPI_start_transaction into a no-op, which might be OK?) Basically this'd
be making the behavior of SPI_commit_and_chain the norm, except you'd
have the option of reverting to default transaction settings instead
of the previous ones.

So basically where we'd end up is that plpython would do about what
you show in your patch, but then there's additional work needed in
spi.c so that we're not leaving the rest of the system in a bad state.

regards, tom lane

Attachment Content-Type Size
python_commit_failure.sql text/x-python 518 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message osumi.takamichi@fujitsu.com 2022-02-07 23:57:02 RE: logical replication empty transactions
Previous Message Greg Stark 2022-02-07 22:00:32 WaitLatchOrSocket seems to not count to 4 right...