Re: Transaction timeout

From: Andres Freund <andres(at)anarazel(dot)de>
To: Andrey Borodin <amborodin86(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Transaction timeout
Date: 2022-12-06 00:22:47
Message-ID: 20221206002247.w3qfxdg64f7arl2z@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-12-05 15:41:29 -0800, Andrey Borodin wrote:
> Thanks for looking into this Andres!
>
> On Mon, Dec 5, 2022 at 3:07 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > I'm a bit worried about adding evermore branches and function calls for
> > the processing of single statements. We already spend a noticable
> > percentage of the cycles for a single statement in PostgresMain(), this
> > adds additional overhead.
> >
> > I'm somewhat inclined to think that we need some redesign here before we
> > add more overhead.
> >
> We can cap statement_timeout\idle_session_timeout by the budget of
> transaction_timeout left.

I don't know what you mean by that.

> @@ -3277,6 +3282,7 @@ ProcessInterrupts(void)
> */
> lock_timeout_occurred = get_timeout_indicator(LOCK_TIMEOUT, true);
> stmt_timeout_occurred = get_timeout_indicator(STATEMENT_TIMEOUT, true);
> + tx_timeout_occurred = get_timeout_indicator(TRANSACTION_TIMEOUT, true);
>
> /*
> * If both were set, we want to report whichever timeout completed

This doesn't update the preceding comment, btw, which now reads oddly:

/*
* If LOCK_TIMEOUT and STATEMENT_TIMEOUT indicators are both set, we
* need to clear both, so always fetch both.
*/

> > > @@ -1360,6 +1363,16 @@ IdleInTransactionSessionTimeoutHandler(void)
> > > SetLatch(MyLatch);
> > > }
> > >
> > > +static void
> > > +TransactionTimeoutHandler(void)
> > > +{
> > > +#ifdef HAVE_SETSID
> > > + /* try to signal whole process group */
> > > + kill(-MyProcPid, SIGINT);
> > > +#endif
> > > + kill(MyProcPid, SIGINT);
> > > +}
> > > +
> >
> > Why does this use signals instead of just setting the latch like
> > IdleInTransactionSessionTimeoutHandler() etc?
>
> I just copied statement_timeout behaviour. As I understand this
> implementation is prefered if the timeout can catch the backend
> running at full steam.

Hm. I'm not particularly convinced by that code. Be that as it may, I
don't think it's a good idea to have one more copy of this code. At
least the patch should wrap the signalling code in a helper.

FWIW, the HAVE_SETSID code originates in:

commit 3ad0728c817bf8abd2c76bd11d856967509b307c
Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Date: 2006-11-21 20:59:53 +0000

On systems that have setsid(2) (which should be just about everything except
Windows), arrange for each postmaster child process to be its own process
group leader, and deliver signals SIGINT, SIGTERM, SIGQUIT to the whole
process group not only the direct child process. This provides saner behavior
for archive and recovery scripts; in particular, it's possible to shut down a
warm-standby recovery server using "pg_ctl stop -m immediate", since delivery
of SIGQUIT to the startup subprocess will result in killing the waiting
recovery_command. Also, this makes Query Cancel and statement_timeout apply
to scripts being run from backends via system(). (There is no support in the
core backend for that, but it's widely done using untrusted PLs.) Per gripe
from Stephen Harris and subsequent discussion.

> > > diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
> > > index 0081873a72..5229fe3555 100644
> > > --- a/src/bin/pg_dump/pg_backup_archiver.c
> > > +++ b/src/bin/pg_dump/pg_backup_archiver.c
> > > @@ -3089,6 +3089,7 @@ _doSetFixedOutputState(ArchiveHandle *AH)
> > > ahprintf(AH, "SET statement_timeout = 0;\n");
> > > ahprintf(AH, "SET lock_timeout = 0;\n");
> > > ahprintf(AH, "SET idle_in_transaction_session_timeout = 0;\n");
> > > + ahprintf(AH, "SET transaction_timeout = 0;\n");
> >
> > Hm - why is that the right thing to do?
> Because transaction_timeout has effects of statement_timeout.

I guess it's just following precedent - but it seems a bit presumptuous
to just disable safety settings a DBA might have set up. That makes some
sense for e.g. idle_in_transaction_session_timeout, because I think
e.g. parallel backup can lead to a connection being idle for a bit.

A few more review comments:

> Either way we can do batch function enable_timeouts() instead
> enable_timeout_after().

> Does anything of it make sense?

I'm at least as worried about the various calls *after* the execution of
a statement.

> + if (tx_timeout_occurred)
> + {
> + LockErrorCleanup();
> + ereport(ERROR,
> + (errcode(ERRCODE_TRANSACTION_TIMEOUT),
> + errmsg("canceling transaction due to transaction timeout")));
> + }

The number of calls to LockErrorCleanup() here feels wrong - there's
already 8 calls in ProcessInterrupts(). Besides the code duplication I
also think it's not a sane idea to rely on having LockErrorCleanup()
before all the relevant ereport(ERROR)s.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-12-06 00:22:57 Re: Add LZ4 compression in pg_dump
Previous Message Tom Lane 2022-12-06 00:18:11 Re: Error-safe user functions