Re: Add Information during standby recovery conflicts

From: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add Information during standby recovery conflicts
Date: 2020-11-25 15:47:44
Message-ID: 48a1c817-cbde-33c4-2672-13ef37e1d946@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 11/25/20 2:20 PM, Masahiko Sawada wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> On Fri, Nov 20, 2020 at 6:18 PM Drouvot, Bertrand <bdrouvot(at)amazon(dot)com> wrote:
>> Hi,
>>
>> On 11/17/20 4:44 PM, Fujii Masao wrote:
>>> Thanks for updating the patch! Here are review comments.
>>>
>>> + Controls whether a log message is produced when the startup
>>> process
>>> + is waiting longer than <varname>deadlock_timeout</varname>
>>> + for recovery conflicts.
>>>
>>> But a log message can be produced also when the backend is waiting
>>> for recovery conflict. Right? If yes, this description needs to be
>>> corrected.
>> Thanks for looking at it!
>>
>> I don't think so, only the startup process should write those new log
>> messages.
>>
>> What makes you think that would not be the case?
>>
>>>
>>> + for recovery conflicts. This is useful in determining if
>>> recovery
>>> + conflicts prevents the recovery from applying WAL.
>>>
>>> "prevents" should be "prevent"?
>> Indeed: fixed in the new attached patch.
>>
>>>
>>> + TimestampDifference(waitStart, GetCurrentTimestamp(), &secs,
>>> &usecs);
>>> + msecs = secs * 1000 + usecs / 1000;
>>>
>>> GetCurrentTimestamp() is basically called before LogRecoveryConflict()
>>> is called. So isn't it better to avoid calling GetCurrentTimestamp()
>>> newly in
>>> LogRecoveryConflict() and to reuse the timestamp that we got?
>>> It's helpful to avoid the waste of cycles.
>>>
>> good catch! fixed in the new attached patch.
>>
>>> + while (VirtualTransactionIdIsValid(*vxids))
>>> + {
>>> + PGPROC *proc =
>>> BackendIdGetProc(vxids->backendId);
>>>
>>> BackendIdGetProc() can return NULL if the backend is not active
>>> at that moment. This case should be handled.
>>>
>> handled in the new attached patch.
>>> + case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
>>> + reasonDesc = gettext_noop("recovery is still
>>> waiting recovery conflict on buffer pin");
>>>
>>> It's natural to use "waiting for recovery" rather than "waiting
>>> recovery"?
>>>
>> I would be tempted to say so, the new patch makes use of "waiting for".
>>> + /* Also, set deadlock timeout for logging purpose if
>>> necessary */
>>> + if (log_recovery_conflict_waits)
>>> + {
>>> + timeouts[cnt].id = STANDBY_TIMEOUT;
>>> + timeouts[cnt].type = TMPARAM_AFTER;
>>> + timeouts[cnt].delay_ms = DeadlockTimeout;
>>> + cnt++;
>>> + }
>>>
>>> This needs to be executed only when the message has not been logged yet.
>>> Right?
>>>
>> good catch: fixed in the new attached patch.
>>
> Thank you for updating the patch! Here are review comments on the
> latest version patch.
>
> + while (VirtualTransactionIdIsValid(*vxids))
> + {
> + PGPROC *proc = BackendIdGetProc(vxids->backendId);
> +
> + if (proc)
> + {
> + if (nprocs == 0)
> + appendStringInfo(&buf, "%d", proc->pid);
> + else
> + appendStringInfo(&buf, ", %d", proc->pid);
> +
> + nprocs++;
> + vxids++;
> + }
> + }
>
> We need to increment vxids even if *proc is null. Otherwise, the loop won't end.

My bad, that's fixed.

>
> ---
> + TimestampTz cur_ts = GetCurrentTimestamp();;
Fixed
>
> There is an extra semi-colon.
>
> ---
> int max_standby_streaming_delay = 30 * 1000;
> +bool log_recovery_conflict_waits = false;
> +bool logged_lock_conflict = false;
>
>
> + if (log_recovery_conflict_waits && !logged_lock_conflict)
> + {
> + timeouts[cnt].id = STANDBY_TIMEOUT;
> + timeouts[cnt].type = TMPARAM_AFTER;
> + timeouts[cnt].delay_ms = DeadlockTimeout;
> + cnt++;
> + }
>
> Can we pass a bool indicating if a timeout may be needed for recovery
> conflict logging from ProcSleep() to ResolveRecoveryConflictWithLock()
> instead of using a static variable?

Yeah that makes more sense, specially as we already have
logged_recovery_conflict at our disposal.

New patch version attached.

Thanks!

Bertrand

Attachment Content-Type Size
v9-0003-Log-the-standby-recovery-conflict-waits.patch text/plain 17.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2020-11-25 15:51:44 Re: A few new options for CHECKPOINT
Previous Message Anastasia Lubennikova 2020-11-25 15:34:48 Re: PoC: custom signal handler for extensions