Re: Add Information during standby recovery conflicts

From: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>
To: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add Information during standby recovery conflicts
Date: 2020-08-10 14:43:36
Message-ID: 45fba139-8415-ae3e-0637-f9c698432f00@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 7/31/20 7:12 AM, Masahiko Sawada wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> On Sat, 11 Jul 2020 at 21:56, Drouvot, Bertrand <bdrouvot(at)amazon(dot)com> wrote:
>>
>> On 7/7/20 9:43 AM, Masahiko Sawada wrote:
>>> Fully makes sense, the new patch version attached is now producing:
>>>> 2020-07-06 06:10:36.022 UTC [14035] LOG: waiting for recovery conflict
>>>> on snapshot
>>> How about adding the subject? that is, "recovery is waiting for
>>> recovery conflict on %s" or "recovery process <pid> is waiting for
>>> conflict on %s".
>>
>> The subject is now added in the new attached patch (I did not include
>> the pid as it is part of the log prefix).
>>
>> It now looks like:
>>
>> 2020-07-11 12:00:41.092 UTC [23217] LOG: recovery is waiting for
>> recovery conflict on snapshot
>> 2020-07-11 12:00:41.092 UTC [23217] DETAIL: There is 1 blocking
>> connection(s).
>> 2020-07-11 12:00:41.092 UTC [23217] CONTEXT: WAL redo at 0/4A0A6BF0 for
>> Heap2/CLEAN: remxid 1128
>> WAL record received at 2020-07-11 12:00:41.092231+00
>> tbs 1663 db 13586 rel 16805, fork main, blkno 0
>>>> 2020-07-06 06:10:36.022 UTC [14035] DETAIL: WAL record received at
>>>> 2020-07-06 06:10:36.021963+00.
>>>> Tablespace/database/relation are 1663/13586/16672, fork is main
>>>> and block is 0.
>>>> There is 1 blocking connection(s).
>>> To follow the existing log message, perhaps this can be something like
>>> "WAL record received at %s, rel %u/%u/%u, fork %s, blkno %u. %d
>>> processes". But I'm not sure the errdetail is the best place to
>>> display the WAL information as I mentioned in the latter part of this
>>> email.
>> moved to the context and formatted the same way as the current
>> Standby/LOCK context.
>>
>>
>>> Ok. Let's keep this thread for the new attached patch that focus on the
>>>> recovery process waiting.
>>> Thank you for updating the patch!
>>>
>>> I've tested the latest patch.
>>
>> Thank you for testing and reviewing!
>>
>>
>>> On recovery conflict on lock and on
>>> bufferpin, if max_standby_streaming_delay is disabled (set to -1), the
>>> logs don't appear even if log_recovery_conflicts is true.
>>
>> Nice catch! it is fixed in the new attached patch (the log reporting has
>> been moved out of StandbyTimeoutHandler()).
>>
>>> Here is random comments on the code:
>>>
>>> + recovery_conflict_main_message = psprintf("waiting for
>>> recovery conflict on %s",
>>> +
>>> get_procsignal_reason_desc(reason));
>>> :
>>> + ereport(LOG,
>>> + (errmsg("%s", recovery_conflict_main_message),
>>> + errdetail("%s\n" "There is %d blocking
>>> connection(s).", wal_record_detail_str, num_waitlist_entries)));
>>>
>>> It's not translation-support-friendly. I think the message "waiting
>>> for recovery conflict on %s" should be surrounded by _(). Or we can
>>> just put it to ereport as follows:
>>>
>>> ereport(LOG,
>>> (errmsg("waiting for recovery conflicts on %s",
>>> get_procsignal_reason_desc(reason))
>>> ...
>>
>> changed in the new attached patch.
>>
>>
>>> ---
>>> + oldcontext = MemoryContextSwitchTo(ErrorContext);
>>> + econtext = error_context_stack;
>>> +
>>> + if (XLogRecGetBlockTag(econtext->arg, 0, &rnode, &forknum, &blknum))
>>>
>>> I don't think it's a good idea to rely on error_context_stack because
>>> other codes might set another error context before reaching here in
>>> the future.
>>
>> right, changed in the new attached patch: this is now done in
>> rm_redo_error_callback() and using the XLogReaderState passed as argument.
>>
>>
>>> ---
>>> + if (XLogRecGetBlockTag(econtext->arg, 0, &rnode, &forknum, &blknum))
>>> + wal_record_detail_str = psprintf("WAL record received
>>> at %s.\nTablespace/database/relation are %u/%u/%u, fork is %s and
>>> block is %u.",
>>> + receipt_time_str,
>>> rnode.spcNode, rnode.dbNode, rnode.relNode,
>>> + forkNames[forknum],
>>> + blknum);
>>>
>>> There might be a block tag in block ids other than 0.
>>
>> right, fixed in the new attached patch.
>>
>>
>>> I'm not sure the
>>> errdetail is the best place where we display WAL information.
>>
>> moved to context in the new attached patch.
>>
>>
>>> For
>>> instance, we display both the relation oid and block number depending
>>> on RM as follows:
>>>
>>> 2020-07-07 15:50:30.733 JST [13344] LOG: waiting for recovery conflict on lock
>>> 2020-07-07 15:50:30.733 JST [13344] DETAIL: WAL record received at
>>> 2020-07-07 15:50:27.73378+09.
>>> There is 1 blocking connection(s).
>>> 2020-07-07 15:50:30.733 JST [13344] CONTEXT: WAL redo at 0/3000028
>>> for Standby/LOCK: xid 506 db 13586 rel 16384
>>>
>>> I wonder if we can display the details of redo WAL information by
>>> improving xlog_outdesc() or rm_redo_error_callback() so that it
>>> displays relfilenode, forknum, and block number. What do you think?
>>
>> I think that fully make sense to move this to rm_redo_error_callback().
>>
>> This is where the information is now been displayed in the new attached
>> patch.
>>
>>
>>> ---
>>> + /* display wal record information */
>>> + if (log_recovery_conflicts)
>>> + LogBlockedWalRecordInfo(num_waitlist_entries, reason);
>>>
>>> I'm concerned that we should log the recovery conflict information
>>> every time when a recovery conflict happens from the perspective of
>>> overheads and the amount of the logs. Can we logs that information
>>> after waiting for deadlock_timeouts secs like log_lock_waits or
>>> waiting for the fixed duration?
>>
>> The new attached patch is now waiting for deadlock_timeout duration.
>>
>>> ---
>>> @@ -609,6 +682,10 @@ StandbyTimeoutHandler(void)
>>> /* forget any pending STANDBY_DEADLOCK_TIMEOUT request */
>>> disable_timeout(STANDBY_DEADLOCK_TIMEOUT, false);
>>>
>>> + /* display wal record information */
>>> + if (log_recovery_conflicts)
>>> + LogBlockedWalRecordInfo(-1,
>>> PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
>>> +
>>> SendRecoveryConflictWithBufferPin(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
>>> }
>>>
>>> ResolveRecoveryConflictWithBufferPin() which sets a timer to call
>>> StandbyTimeoutHandler() can be called multiple times even if the
>>> recovery is waiting for one buffer pin. I think we should avoid
>>> logging the same contents multiple times.
>>
>> I do agree, only the first pass is now been logged.
>>
>>
>>> ---
>>> -
>>> + {
>>> + {"log_recovery_conflicts", PGC_SUSET, LOGGING_WHAT,
>>> + gettext_noop("Logs standby recovery conflicts."),
>>> + NULL
>>> + },
>>> + &log_recovery_conflicts,
>>> + true,
>>> + NULL, NULL, NULL
>>> + },
>>>
>>> Other logging parameters such as log_lock_waits is false by default. I
>>> think this parameter can also be false by default but is there any
>>> reason to enable it by default?
>>
>> now set to false by default.
>>
>>
>>> Since this parameter applies only to the startup process, I think it
>>> should be PGC_SIGHUP.
>>
>> changed that way.
>>
>>
>>> ---
>>> + /* display wal record information */
>>> + if (log_recovery_conflicts)
>>> + LogBlockedWalRecordInfo(CountDBBackends(dbid),
>>> PROCSIG_RECOVERY_CONFLICT_DATABASE);
>>>
>>> We log the recovery conflict into the server log but we don't update
>>> the process title to append "waiting". While discussing the process
>>> title update on recovery conflict, I got the review comment[1] that we
>>> don't need to update the process title because no wait occurs when
>>> recovery conflict with database happens. As the comment says, recovery
>>> is canceling the existing processes on the database being removed, but
>>> not waiting for something.
>>
>> Ok, I keep reporting the conflict but changed the wording for this
>> particular case.
> Thank you for updating the patch! Here is my comments on the latest
> version patch:
>
> + /*
> + * Ensure we are in the startup process
> + * if we want to log standby recovery conflicts
> + */
> + if (AmStartupProcess() && log_recovery_conflicts)
> + {
>
> This function must be used by only the startup process. Not sure but
> if we need this check I think we should use an assertion rather than a
> condition in if statement.
>
> I think the information about relfilenode and forknumber is useful
> even in a normal recovery case. I think we can always add this
> information to errcontext. What do you think?
Fully agree, that it could be useful outside the context of this patch.
>
> + GetXLogReceiptTime(&rtime, &fromStream);
> + if (fromStream)
> + {
> + receipt_time_str = pstrdup(timestamptz_to_str(rtime));
> + appendStringInfo(&buf,"\nWAL record received at %s",
> receipt_time_str);
>
> Not sure showing the receipt time of WAL record is useful for users in
> this case. IIUC that the receipt time doesn't correspond to individual
> WAL records whereas the errcontext information is about the particular
> WAL record.
>
> + for (block_id = 0; block_id <= record->max_block_id; block_id++)
> + {
> + if (XLogRecGetBlockTag(record, block_id, &rnode,
> &forknum, &blknum))
> + appendStringInfo(&buf,"\ntbs %u db %u rel %u, fork
> %s, blkno %u",
> + rnode.spcNode, rnode.dbNode, rnode.relNode,
> + forkNames[forknum],
> + blknum);
>
> How about showing something like pg_waldump?
>
> CONTEXT: WAL redo at 0/3059100 for Heap2/CLEAN: remxid 506, blkref
> #0: rel 1000/20000/1234 fork main blk 10, blkref #1: rel
> 1000/20000/1234 fork main blk 11
>
> or
>
> CONTEXT: WAL redo at 0/3059100 for Heap2/CLEAN: remxid 506
> blkref #0: rel 1000/20000/1234 fork main blk 10
> blkref #1: rel 1000/20000/1234 fork main blk 11
>
> But the latter format makes grepping difficult.
>
> Also, I guess the changes in rm_redo_error_callback can also be a
> separate patch.

I fully agree with your comments.

As you have already seen I created a new patch dedicated to the
rm_redo_error_callback() changes:
https://commitfest.postgresql.org/29/2668/.

Then those changes are not part of the new attached patch related to
this thread anymore.

>
> +/*
> + * Display information about the wal record
> + * apply being blocked
> + */
> +static void
> +LogBlockedWalRecordInfo(int num_waitlist_entries, ProcSignalReason reason)
>
> I think the function name needs to be updated. The function no longer
> logs WAL record information.
right.
>
> +{
> + if (num_waitlist_entries > 0)
> + if (reason == PROCSIG_RECOVERY_CONFLICT_DATABASE)
> + ereport(LOG,
> + (errmsg("recovery is experiencing recovery conflict on
> %s", get_procsignal_reason_desc(reason)),
> + errdetail("There is %d conflicting connection(s).",
> num_waitlist_entries)));
> + else
> + ereport(LOG,
> + (errmsg("recovery is waiting for recovery conflict on
> %s", get_procsignal_reason_desc(reason)),
> + errdetail("There is %d blocking connection(s).",
> num_waitlist_entries)));
> + else
> + ereport(LOG,
> + (errmsg("recovery is waiting for recovery conflict on %s",
> get_procsignal_reason_desc(reason))));
> +}
>
> How about displaying actual virtual transaction ids or process ids the
> startup process is waiting for? For example, the log is going to be:
>
> LOG: recovery is waiting for recovery conflict on snapshot
> DETAIL: Conflicting virtual transaction ids: 100/101, 200/202, 300/303

The new attached patch adds the information related to the virtual
transaction ids, so that the outcome is now:

2020-08-10 13:31:00.485 UTC [1760] LOG:  recovery is resolving recovery
conflict on snapshot
2020-08-10 13:31:00.485 UTC [1760] DETAIL:  Conflicting virtual
transaction ids: 4/2, 3/2.

>
> or
>
> LOG: recovery is waiting for recovery conflict on snapshot
> DETAIL: Conflicting processes: 123, 456, 789
I think it makes more sense to display the conflicting virtual
transaction id(s), as the blocking process(es) will be known during the
statement cancellation.
>
> FYI, errdetail_plural() or errdetail_log_plural() can be used for pluralization.
>
> + tmpWaitlist = waitlist;
> + while (VirtualTransactionIdIsValid(*tmpWaitlist))
> + {
> + tmpWaitlist++;
> + }
> +
> + num_waitlist_entries = (tmpWaitlist - waitlist);
> +
> + /* display wal record information */
> + if (log_recovery_conflicts &&
> (TimestampDifferenceExceeds(recovery_conflicts_log_time,
> GetCurrentTimestamp(),
> + DeadlockTimeout))) {
> + LogBlockedWalRecordInfo(num_waitlist_entries, reason);
> + recovery_conflicts_log_time = GetCurrentTimestamp();
> + }
>
> recovery_conflicts_log_time is not initialized. And shouldn't we
> compare the current timestamp to the timestamp when the startup
> process started waiting?
>
> I think we should call LogBlockedWalRecordInfo() inside of the inner
> while loop rather than at the beginning of
> ResolveRecoveryConflictWithVirtualXIDs(). In lock conflict cases, the
> startup process waits until 'ltime', then enters
> ResolveRecoveryConflictWithVirtualXIDs() after reaching 'ltime'.
> Therefore, it makes sense to call LogBlockedWalRecordInfo() at the
> beginning of ResolveRecoveryConflictWithVirtualXIDs(). However, in
> snapshot and tablespace conflict cases (i.g.
> ResolveRecoveryConflictWithSnapshot() and
> ResolveRecoveryConflictWithTablespace()), it enters
> ResolveRecoveryConflictWithVirtualXIDs() without waits and waits for
> reaching ‘ltime’ inside of the inner while look. So the above
> condition could always be false.

That would make the information being displayed after
max_standby_streaming_delay is reached for the multiple cases you just
described.

It indeed makes more sense to call the function in the inside of the
inner loop: this is done in the new attached patch.

>
> I wonder if we can have something like the following function:
>
> bool
> LogRecoveryConflict(VirtualTransactionId *waitlist, ProcSignalReason reason,
> TimestampTz wait_start)
> {
> if (!TimestampDifferenceExceeds(wait_start, GetCurrentTimestamp(),
>
> max_standby_streaming_delay))
> return false;
>
> if (waitlist)
> {
> char *buf;
>
> buf = construct a string containing all process ids (or
> virtual transaction ids) to resolve from waitlist;
>
> ereport(LOG,
> (errmsg("recovery is waiting recovery conflict on %s",
> get_procsignal_reason_desc(reason)),
> (errdetail_log_plural("Conflicting process : %s.",
> "Conflicting processes : %s.",
> the number of processes in *waitlist, buf))));
> }
> else
> ereport(LOG,
> (errmsg("recovery is resolving recovery conflict on %s",
> get_procsignal_reason_desc(reason))));
>
> return true;
> }
>
> 'wait_start' is the timestamp when the caller started waiting for the
> recovery conflict. This function logs resolving recovery conflict with
> detailed information if needed. The caller call this function when
> log_recovery_conflicts is enabled as follows:
>
> bool logged = false;
> :
> if (log_recovery_conflicts && !logged)
> logged = LogRecoveryConflict(waitlist, reason, waitStart);

It's more or less what the new attached patch is doing, except that it
deals with the waitStart comparison in the while loop (to make the
changes to the LogRecoveryConflict() calls outside
ResolveRecoveryConflictWithVirtualXIDs() simpler).

For the buffer pin case, I moved the call to LogRecoveryConflict to
StandbyTimeoutHandler (instead of the beginning of
ResolveRecoveryConflictWithBufferPin).

That way we are consistent across all the cases: we are logging once the
conflict resolution is actually happening (means we are ready/about to
cancel what is needed).

Then, the LogRecoveryConflict() is now not reporting that "recovery is
waiting recovery conflict" anymore but that "recovery is resolving
recovery conflict" in all the cases.

>
> + <varlistentry id="guc-log-recovery-conflicts"
> xreflabel="log_recovery_conflicts">
> + <term><varname>log_recovery_conflicts</varname> (<type>boolean</type>)
> + <indexterm>
> + <primary><varname>log_recovery_conflicts</varname>
> configuration parameter</primary>
> + </indexterm>
> + </term>
> + <listitem>
> + <para>
> + Controls whether detailed information is produced when a conflict
> + occurs during standby recovery. The default is <literal>on</literal>.
> + Only superusers can change this setting.
> + </para>
> + </listitem>
> + </varlistentry>
>
> I think the documentation needs to be updated.
Right, updated that way in the new attached patch.
>
> IIUC this feature is to logging the startup process is waiting due to
> recovery conflict or waiting for recovery conflict resolution. But it
> logs only when the startup process waits for longer than a certain
> time. I think we can improve the documentation in terms of that point.
> Also, the last sentence is no longer true; log_recovery_conflicts is
> now PGC_SIGHUP.
>
> BTW 'log_recovery_conflict_waits' might be better name for consistency
> with log_lock_waits?

As per the changes in this new patch, we are not logging when the
startup process starts waiting anymore, but only during the conflict
resolution.

I changed the GUC to "log_recovery_conflicts_resolution" instead (based
on my comment above).

Thanks,

Bertrand

Attachment Content-Type Size
v1-0003-Add-Standby-Recovery-Conflicts-BlockedOnly-Info.patch text/plain 8.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-08-10 14:44:21 Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks
Previous Message Robert Haas 2020-08-10 14:37:09 nested queries vs. pg_stat_activity