Re: Add Information during standby recovery conflicts

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add Information during standby recovery conflicts
Date: 2020-12-05 03:38:12
Message-ID: CAD21AoDLzzZp_3qs7XwRZDV_-AHNW8ZUyQh8Hj1m-Sm+Wv9wGA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 4, 2020 at 7:22 PM Drouvot, Bertrand <bdrouvot(at)amazon(dot)com> wrote:
>
> Hi,
>
> On 12/4/20 2:21 AM, Fujii Masao wrote:
> >
> > On 2020/12/04 9:28, Masahiko Sawada wrote:
> >> On Fri, Dec 4, 2020 at 2:54 AM Fujii Masao
> >> <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
> >>>
> >>>
> >>>
> >>> On 2020/12/01 17:29, Drouvot, Bertrand wrote:
> >>>> Hi,
> >>>>
> >>>> On 12/1/20 12:35 AM, Masahiko Sawada wrote:
> >>>>> CAUTION: This email originated from outside of the organization.
> >>>>> Do not click links or open attachments unless you can confirm the
> >>>>> sender and know the content is safe.
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Tue, Dec 1, 2020 at 3:25 AM Alvaro Herrera
> >>>>> <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> >>>>>> On 2020-Dec-01, Fujii Masao wrote:
> >>>>>>
> >>>>>>> + if (proc)
> >>>>>>> + {
> >>>>>>> + if (nprocs == 0)
> >>>>>>> + appendStringInfo(&buf, "%d", proc->pid);
> >>>>>>> + else
> >>>>>>> + appendStringInfo(&buf, ", %d", proc->pid);
> >>>>>>> +
> >>>>>>> + nprocs++;
> >>>>>>>
> >>>>>>> What happens if all the backends in wait_list have gone? In
> >>>>>>> other words,
> >>>>>>> how should we handle the case where nprocs == 0 (i.e., nprocs
> >>>>>>> has not been
> >>>>>>> incrmented at all)? This would very rarely happen, but can happen.
> >>>>>>> In this case, since buf.data is empty, at least there seems no
> >>>>>>> need to log
> >>>>>>> the list of conflicting processes in detail message.
> >>>>>> Yes, I noticed this too; this can be simplified by changing the
> >>>>>> condition in the ereport() call to be "nprocs > 0" (rather than
> >>>>>> wait_list being null), otherwise not print the errdetail. (You
> >>>>>> could
> >>>>>> test buf.data or buf.len instead, but that seems uglier to me.)
> >>>>> +1
> >>>>>
> >>>>> Maybe we can also improve the comment of this function from:
> >>>>>
> >>>>> + * This function also reports the details about the conflicting
> >>>>> + * process ids if *wait_list is not NULL.
> >>>>>
> >>>>> to " This function also reports the details about the conflicting
> >>>>> process ids if exist" or something.
> >>>>>
> >>>> Thank you all for the review/remarks.
> >>>>
> >>>> They have been addressed in the new attached patch version.
> >>>
> >>> Thanks for updating the patch! I read through the patch again
> >>> and applied the following chages to it. Attached is the updated
> >>> version of the patch. Could you review this version? If there is
> >>> no issue in it, I'm thinking to commit this version.
> >>
> >> Thank you for updating the patch! I have one question.
> >>
> >>>
> >>> + timeouts[cnt].id = STANDBY_TIMEOUT;
> >>> + timeouts[cnt].type = TMPARAM_AFTER;
> >>> + timeouts[cnt].delay_ms = DeadlockTimeout;
> >>>
> >>> Maybe STANDBY_TIMEOUT should be STANDBY_DEADLOCK_TIMEOUT here?
> >>> I changed the code that way.
> >>
> >> As the comment of ResolveRecoveryConflictWithLock() says the
> >> following, a deadlock is detected by the ordinary backend process:
> >>
> >> * Deadlocks involving the Startup process and an ordinary backend
> >> proces
> >> * will be detected by the deadlock detector within the ordinary
> >> backend.
> >>
> >> If we use STANDBY_DEADLOCK_TIMEOUT,
> >> SendRecoveryConflictWithBufferPin() will be called after
> >> DeadlockTimeout passed, but I think it's not necessary for the startup
> >> process in this case.
> >
> > Thanks for pointing this! You are right.
> >
> >
> >> If we want to just wake up the startup process
> >> maybe we can use STANDBY_TIMEOUT here?
> >
> Thanks for the patch updates! Except what we are still discussing below,
> it looks good to me.
>
> > When STANDBY_TIMEOUT happens, a request to release conflicting buffer
> > pins is sent. Right? If so, we should not also use STANDBY_TIMEOUT there?
>
> Agree
>
> >
> > Or, first of all, we don't need to enable the deadlock timer at all?
> > Since what we'd like to do is to wake up after deadlock_timeout
> > passes, we can do that by changing ProcWaitForSignal() so that it can
> > accept the timeout and giving the deadlock_timeout to it. If we do
> > this, maybe we can get rid of STANDBY_LOCK_TIMEOUT from
> > ResolveRecoveryConflictWithLock(). Thought?

Where do we enable deadlock timeout in hot standby case? You meant to
enable it in ProcWaitForSignal() or where we set a timer for not hot
standby case, in ProcSleep()?

>
> Why not simply use (again) the STANDBY_LOCK_TIMEOUT one? (as it triggers
> a call to StandbyLockTimeoutHandler() which does nothing, except waking
> up. That's what we want, right?)

Right, what I wanted to mean is STANDBY_LOCK_TIMEOUT. The startup
process can wake up and do nothing. Thank you for pointing out.

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhihong Yu 2020-12-05 03:51:48 Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Previous Message Bruce Momjian 2020-12-05 03:32:45 Re: Proposed patch for key managment