Re: Add Information during standby recovery conflicts

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: masao(dot)fujii(at)oss(dot)nttdata(dot)com
Cc: bdrouvot(at)amazon(dot)com, sawada(dot)mshk(at)gmail(dot)com, alvherre(at)alvh(dot)no-ip(dot)org, masahiko(dot)sawada(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Add Information during standby recovery conflicts
Date: 2020-12-16 02:22:35
Message-ID: 20201216.112235.1367853844658985673.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Tue, 15 Dec 2020 15:40:03 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
>
>
> On 2020/12/15 12:04, Kyotaro Horiguchi wrote:
> > [40509:startup] DETAIL: Conflicting processes: 41171, 41194.
...
> > I'm not sure, but it seems to me at least the period is unnecessary
> > here.
>
> Since Error Message Style Guide in the docs says "Detail and hint
> messages:
> Use complete sentences, and end each with a period.", I think that a
> period
> is necessary here. No?

In the first place it is not a complete sentence. Might be better be
something like this if we strictly follow the style guide?

> Conflicting processes are 41171, 41194.
> Conflicting processes are: 41171, 41194.

>
> > + bool maybe_log_conflict =
> > + (standbyWaitStart != 0 && !logged_recovery_conflict);
> > odd indentation.
>
> This is the result of pgindent run. I'm not sure why pgindent indents
> that way, but for now I'd like to follow pgindent.

Agreed.

> > + /* Also, set the timer if necessary */
> > + if (logging_timer)
> > + {
> > + timeouts[cnt].id = STANDBY_LOCK_TIMEOUT;
> > + timeouts[cnt].type = TMPARAM_AFTER;
> > + timeouts[cnt].delay_ms = DeadlockTimeout;
> > + cnt++;
> > + }
> > This doesn't consider spurious wakeup. I'm not sure it actually
> > happenes but we usually consider that. That is since before this
> > patch, but ProcWaitForSignal()'s comment says that:
> >
> >> * As this uses the generic process latch the caller has to be robust
> >> * against
> >> * unrelated wakeups: Always check that the desired state has occurred,
> >> * and
> >> * wait again if not.
> > If we don't care of spurious wakeups, we should add such a comment.
>
> If ProcWaitForSignal() wakes up because of the reason (e.g., SIGHUP)
> other than deadlock_timeout, ProcSleep() will call
> ResolveRecoveryConflictWithLock() and we sleep on ProcWaitForSignal()
> again since the recovery conflict has not been resolved yet. So we can
> say that we consider "spurious wakeups"?

So, the following seems to be spurious wakeups..

> However when I read the related code again, I found another issue in
> the patch. In ResolveRecoveryConflictWithLock(), if SIGHUP causes us
> to
> wake up out of ProcWaitForSignal() before deadlock_timeout is reached,
> we will use deadlock_timeout again when sleeping on
> ProcWaitForSignal().
> Instead, probably we should use the "deadlock_timeout - elapsed time"
> so that we can emit a log message as soon as deadlock_timeout passes
> since starting waiting on recovery conflict. Otherwise it may take at
> most
> "deadlock_timeout * 2" to log "still waiting ..." message.
>
> To fix this issue, "deadlock_timeout - elapsed time" needs to be used
> as
> the timeout when enabling the timer at least in
> ResolveRecoveryConflictWithLock() and
> ResolveRecoveryConflictWithBufferPin().
> Also similar change needs to be applied to
> ResolveRecoveryConflictWithVirtualXIDs().
>
> BTW, without applying the patch, *originally*
> ResolveRecoveryConflictWithBufferPin() seems to have this issue.
> It enables deadlock_timeout timer so as to request for hot-standbfy
> backends to check themselves for deadlocks. But if we wake up out of
> ProcWaitForSignal() before deadlock_timeout is reached, the subsequent
> call to ResolveRecoveryConflictWithBufferPin() also uses
> deadlock_timeout
> again instead of "deadlock_timeout - elapsed time". So the request for
> deadlock check can be delayed. Furthermore,
> if ResolveRecoveryConflictWithBufferPin() always wakes up out of
> ProcWaitForSignal() before deadlock_timeout is reached, the request
> for deadlock check may also never happen infinitely.
>
> Maybe we should fix the original issue at first separately from the
> patch.

Yeah, it's not an issue of this patch.

> > + bool maybe_log_conflict;
> > + bool maybe_update_title;
> > Although it should be a matter of taste and I understand that the
> > "maybe" means that "that logging and changing of ps display may not
> > happen in this iteration" , that variables seem expressing
> > respectively "we should write log if the timeout for recovery conflict
> > expires" and "we should update title if 500ms elapsed". So they seem
> > *to me* better be just "log_conflict" and "update_title".
> > I feel the same with "maybe_log_conflict" in ProcSleep().
>
> I have no strong opinion about those names. So if other people also
> think so, I'm ok to rename them.

Shorter is better as far as it makes sense and not-too abbreviated.

> > + for recovery conflicts. This is useful in determining if recovery
> > + conflicts prevent the recovery from applying WAL.
> > (I'm not confident on this) Isn't the sentence better be in past or
> > present continuous tense?
>
> Could you tell me why you think that's better?

To make sure, I mentioned about the "prevent". The reason for that is
it represents the status at the present or past. I don't insist on
that if you don't think it's not better.

> for recovery conflicts. This is useful in determining if recovery
> conflicts are preventing the recovery from applying WAL.

> >> BTW, attached is the POC patch that implements the idea discussed
> >> upthread;
> >> if log_recovery_conflict_waits is enabled, the startup process reports
> >> the log also after the recovery conflict was resolved and the startup
> >> process
> >> finished waiting for it. This patch needs to be applied after
> >> v11-0002-Log-the-standby-recovery-conflict-waits.patch is applied.
> > Ah. I was just writing a comment about that. I haven't looked it
> > closer but it looks good to me. By the way doesn't it contains a
> > simple fix of a comment for the base patch?
>
> Yes, so the typo included in the base patch should be fixed when
> pushing it.

Understood. Thanks.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-12-16 02:31:46 Re: Misleading comment in prologue of ReorderBufferQueueMessage
Previous Message Kyotaro Horiguchi 2020-12-16 02:01:20 Re: archive status ".ready" files may be created too early