Re: A new function to wait for the backend exit after termination

From: Zhihong Yu <zyu(at)yugabyte(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Muhammad Usama <m(dot)usama(at)gmail(dot)com>
Subject: Re: A new function to wait for the backend exit after termination
Date: 2021-03-17 09:19:23
Message-ID: CALNJ-vTzFY9PDbrA=DQPGnU86XmL__nqmr-E+nreueD6Sq35Hw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,
w.r.t. WaitLatch(), if its return value is WL_TIMEOUT, we know the
specified timeout has elapsed.
It seems WaitLatch() can be enhanced to also return the actual duration of
the wait.
This way, the caller can utilize the duration directly.

As for other places where WaitLatch() is called, similar change can be
applied on a per-case basis (with separate patches, not under this topic).

Cheers

On Wed, Mar 17, 2021 at 2:04 AM Bharath Rupireddy <
bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:

> On Wed, Mar 17, 2021 at 8:28 AM Kyotaro Horiguchi
> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> >
> > At Wed, 17 Mar 2021 07:01:39 +0530, Bharath Rupireddy <
> bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote in
> > > Attaching v10 patch for further review.
> >
> > The time-out mechanism doesn't count remainingtime as expected,
> > concretely it does the following.
> >
> > do {
> > kill();
> > WaitLatch(WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, waittime);
> > ResetLatch(MyLatch);
> > remainingtime -= waittime;
> > } while (remainingtime > 0);
> >
> > So, the WaitLatch doesn't consume as much time as the set waittime in
> > case of latch set. remainingtime reduces faster than the real at the
> > iteration.
>
> WaitLatch can exit without waiting for the waittime duration whenever
> the MyLatch is set (SetLatch). Now the question is how frequently
> SetLatch can get called in a backend? For instance, if we keep calling
> pg_reload_conf in any of the backends in the cluster, then the
> SetLatch will be called and the timeout in pg_wait_until_termination
> will be reached fastly. I see that this problem can also exist in
> case of pg_promote function. Similarly it may exist in other places
> where we have WaitLatch for timeouts.
>
> IMO, the frequency of SetLatch calls may not be that much in real time
> scenarios. If at all, the latch gets set too frequently, then the
> terminate and wait functions might timeout earlier. But is it a
> critical problem to worry about? (IMHO, it's not that critical) If
> yes, we might as well need to fix it (I don't know how?) in other
> critical areas like pg_promote?
>
> > It wouldn't happen actually but I concern about PID recycling. We can
> > make sure to get rid of the fear by checking for our BEENTRY instead
> > of PID. However, it seems to me that some additional function is
> > needed in pgstat.c so that we can check the realtime value of
> > PgBackendStatus, which might be too much.
>
> The aim of the wait logic is to ensure that the process is gone from
> the system processes that is why using kill(), not it's entries are
> gone from the shared memory.
>
> > + /* If asked to wait, check whether the timeout value is valid or
> not. */
> > + if (wait && pid != MyProcPid)
> > + {
> > + timeout = PG_GETARG_INT64(2);
> > +
> > + if (timeout <= 0)
> > + ereport(ERROR,
> > +
> (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
> > + errmsg("\"timeout\" must not be
> negative or zero")));
> >
> > This means that pg_terminate_backend accepts negative timeouts when
> > terminating myself, which looks odd.
>
> I will change this.
>
> > Is there any reason to reject 0 as timeout?
>
> Actually, timeout 0 should mean that "don't wait" and we can error out
> on negative values. Thoughts?
>
> > + * Wait only if requested and the termination is successful. Self
> > + * termination is allowed but waiting is not.
> > + */
> > + if (wait && pid != MyProcPid && result)
> > + result = pg_wait_until_termination(pid, timeout);
> >
> > Why don't we wait for myself to be terminated? There's no guarantee
> > that myself will be terminated without failure. (I agree that that is
> > not so useful, but I think there's no reason not to do so.)
>
> We could programmatically allow it to wait in case of self termination
> and it doesn't make any difference to the user, they would see
> "Terminating connection due to administrator command" FATAL error. I
> can remove pid != MyProcPid.
>
> > The first suggested signature for pg_terminate_backend() with timeout
> > was pg_terminate_backend(pid, timeout). The current signature (pid,
> > wait?, timeout) looks redundant. Maybe the reason for rejecting 0
> > astimeout is pg_terminate_backend(pid, true, 0) looks odd but it we
> > can wait forever in that case (as other features does). On the other
> > hand pg_terminate_backend(pid, false, 100) is apparently odd but this
> > patch doesn't seem to reject it. If there's no considerable reason
> > for the current signature, I would suggest that:
> >
> > pg_terminate_backend(pid, timeout), where it waits forever if timeout
> > is zero and waits for the timeout if positive. Negative values are not
> > accepted.
>
> So, as stated above, how about a timeout 0 (which is default) telling
> "don't wait", negative error out, a positive milliseconds value
> indicating that we should wait after termination?
>
> And for pg_wait_for_backend_termination timeout 0 or negative, we error
> out?
>
> IMO, the above semantics are better than timeout 0 meaning "wait
> forever". Thoughts?
>
> > + ereport(WARNING,
> > + (errmsg("could not check
> the existence of the backend with PID %d: %m",
> > + pid)));
> > + return false;
> >
> > I think this is worth ERROR. We can avoid this handling if we look
> > into PgBackendEntry instead.
>
> I will change it to ERROR.
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message houzj.fnst@fujitsu.com 2021-03-17 09:45:33 RE: Parallel Inserts in CREATE TABLE AS
Previous Message Fabien COELHO 2021-03-17 09:17:01 Re: pgsql: Add libpq pipeline mode support to pgbench