Re: Transaction timeout

From: Andrey Borodin <amborodin86(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Transaction timeout
Date: 2022-12-18 20:53:31
Message-ID: CAAhFRxg9i6DrHT1cALBNztsnJ7OR3cawfCmyFA=J6=oDFX+h2Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 7, 2022 at 1:30 PM Andrey Borodin <amborodin86(at)gmail(dot)com> wrote:
> I hope to address other feedback on the weekend.

Andres, here's my progress on working with your review notes.

> > @@ -3277,6 +3282,7 @@ ProcessInterrupts(void)
> > */
> > lock_timeout_occurred = get_timeout_indicator(LOCK_TIMEOUT, true);
> > stmt_timeout_occurred = get_timeout_indicator(STATEMENT_TIMEOUT, true);
> > + tx_timeout_occurred = get_timeout_indicator(TRANSACTION_TIMEOUT, true);
> >
> > /*
> > * If both were set, we want to report whichever timeout completed
>
> This doesn't update the preceding comment, btw, which now reads oddly:

I've rewritten this part to correctly report all timeouts that did
happen. However there's now a tricky comma-formatting code which was
tested only manually.

> > > > @@ -1360,6 +1363,16 @@ IdleInTransactionSessionTimeoutHandler(void)
> > > > SetLatch(MyLatch);
> > > > }
> > > >
> > > > +static void
> > > > +TransactionTimeoutHandler(void)
> > > > +{
> > > > +#ifdef HAVE_SETSID
> > > > + /* try to signal whole process group */
> > > > + kill(-MyProcPid, SIGINT);
> > > > +#endif
> > > > + kill(MyProcPid, SIGINT);
> > > > +}
> > > > +
> > >
> > > Why does this use signals instead of just setting the latch like
> > > IdleInTransactionSessionTimeoutHandler() etc?
> >
> > I just copied statement_timeout behaviour. As I understand this
> > implementation is prefered if the timeout can catch the backend
> > running at full steam.
>
> Hm. I'm not particularly convinced by that code. Be that as it may, I
> don't think it's a good idea to have one more copy of this code. At
> least the patch should wrap the signalling code in a helper.

Done, now there is a single CancelOnTimeoutHandler() handler.

> > > > diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
> > > > index 0081873a72..5229fe3555 100644
> > > > --- a/src/bin/pg_dump/pg_backup_archiver.c
> > > > +++ b/src/bin/pg_dump/pg_backup_archiver.c
> > > > @@ -3089,6 +3089,7 @@ _doSetFixedOutputState(ArchiveHandle *AH)
> > > > ahprintf(AH, "SET statement_timeout = 0;\n");
> > > > ahprintf(AH, "SET lock_timeout = 0;\n");
> > > > ahprintf(AH, "SET idle_in_transaction_session_timeout = 0;\n");
> > > > + ahprintf(AH, "SET transaction_timeout = 0;\n");
> > >
> > > Hm - why is that the right thing to do?
> > Because transaction_timeout has effects of statement_timeout.
>
> I guess it's just following precedent - but it seems a bit presumptuous
> to just disable safety settings a DBA might have set up. That makes some
> sense for e.g. idle_in_transaction_session_timeout, because I think
> e.g. parallel backup can lead to a connection being idle for a bit.

I do not know. My reasoning - everywhere we turn off
statement_timeout, we should turn off transaction_timeout too.
But I have no strong opinion here. I left this code as is in the patch
so far. For the same reason I did not change anything in
pg_backup_archiver.c.

> > Either way we can do batch function enable_timeouts() instead
> > enable_timeout_after().
>
> > Does anything of it make sense?
>
> I'm at least as worried about the various calls *after* the execution of
> a statement.

I think this code is just a one bit check
if (get_timeout_active(TRANSACTION_TIMEOUT))
inside of get_timeout_active(). With all 14 timeouts we have, I don't
see a good way to optimize stuff so far.

> > + if (tx_timeout_occurred)
> > + {
> > + LockErrorCleanup();
> > + ereport(ERROR,
> > + (errcode(ERRCODE_TRANSACTION_TIMEOUT),
> > + errmsg("canceling transaction due to transaction timeout")));
> > + }
>
> The number of calls to LockErrorCleanup() here feels wrong - there's
> already 8 calls in ProcessInterrupts(). Besides the code duplication I
> also think it's not a sane idea to rely on having LockErrorCleanup()
> before all the relevant ereport(ERROR)s.

I've refactored that code down to 7 calls of LockErrorCleanup() :)
Logic behind various branches is not clear for me, e.g. why we do not
LockErrorCleanup() when reading commands from a client?
So I did not risk refactoring further.

> I think the test should verify
> that transaction timeout interacts correctly with statement timeout /
> idle in tx timeout.

I've added tests that check statement_timeout vs transaction_timeout.
However I could not produce stable tests with
idle_in_transaction_timeout vs transaction_timeout so far. But I'll
look into this more.
Actually, stabilizing statement_timeout vs transaction_timeout was
tricky on Windows too. I had to remove the second call to
pg_sleep(0.0001) because it was triggering 10ьs timeout from time to
time. Also, test timeout was increased to 30ms, because unlike others
in spec it's not supposed to happen at the very first SQL statement.

Thank you!

Best regards, Andrey Borodin.

Attachment Content-Type Size
v4-0001-Intorduce-transaction_timeout.patch application/octet-stream 16.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2022-12-18 22:20:49 Re: New strategies for freezing, advancing relfrozenxid early
Previous Message Andrew Dunstan 2022-12-18 14:42:39 Re: Error-safe user functions