Re: Add Information during standby recovery conflicts

From: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>
To: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add Information during standby recovery conflicts
Date: 2020-08-28 14:14:25
Message-ID: 0fb96369-cc7c-cddb-a474-ca3d29904123@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 8/28/20 7:03 AM, Masahiko Sawada wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> On Thu, 27 Aug 2020 at 20:58, Drouvot, Bertrand <bdrouvot(at)amazon(dot)com> wrote:
>>
>> On 8/27/20 10:16 AM, Masahiko Sawada wrote
>>> On Mon, 10 Aug 2020 at 23:43, Drouvot, Bertrand <bdrouvot(at)amazon(dot)com> wrote:
>>>> Hi,
>>>>
>>>> On 7/31/20 7:12 AM, Masahiko Sawada wrote:
>>>>> + tmpWaitlist = waitlist;
>>>>> + while (VirtualTransactionIdIsValid(*tmpWaitlist))
>>>>> + {
>>>>> + tmpWaitlist++;
>>>>> + }
>>>>> +
>>>>> + num_waitlist_entries = (tmpWaitlist - waitlist);
>>>>> +
>>>>> + /* display wal record information */
>>>>> + if (log_recovery_conflicts &&
>>>>> (TimestampDifferenceExceeds(recovery_conflicts_log_time,
>>>>> GetCurrentTimestamp(),
>>>>> + DeadlockTimeout))) {
>>>>> + LogBlockedWalRecordInfo(num_waitlist_entries, reason);
>>>>> + recovery_conflicts_log_time = GetCurrentTimestamp();
>>>>> + }
>>>>>
>>>>> recovery_conflicts_log_time is not initialized. And shouldn't we
>>>>> compare the current timestamp to the timestamp when the startup
>>>>> process started waiting?
>>>>>
>>>>> I think we should call LogBlockedWalRecordInfo() inside of the inner
>>>>> while loop rather than at the beginning of
>>>>> ResolveRecoveryConflictWithVirtualXIDs(). In lock conflict cases, the
>>>>> startup process waits until 'ltime', then enters
>>>>> ResolveRecoveryConflictWithVirtualXIDs() after reaching 'ltime'.
>>>>> Therefore, it makes sense to call LogBlockedWalRecordInfo() at the
>>>>> beginning of ResolveRecoveryConflictWithVirtualXIDs(). However, in
>>>>> snapshot and tablespace conflict cases (i.g.
>>>>> ResolveRecoveryConflictWithSnapshot() and
>>>>> ResolveRecoveryConflictWithTablespace()), it enters
>>>>> ResolveRecoveryConflictWithVirtualXIDs() without waits and waits for
>>>>> reaching ‘ltime’ inside of the inner while look. So the above
>>>>> condition could always be false.
>>>> That would make the information being displayed after
>>>> max_standby_streaming_delay is reached for the multiple cases you just
>>>> described.
>>> Sorry, it should be deadlock_timeout, not max_standby_streaming_delay.
>>> Otherwise, the recovery conflict log message is printed when
>>> resolution, which seems not to achieve the original purpose. Am I
>>> missing something?
>> Ok, I understand where the confusion is coming from.
>>
>> Indeed the new version is now printing the recovery conflict log message
>> during the conflict resolution (while the initial intention was to be
>> warned as soon as the replay had to wait).
>>
>> The advantage of the new version is that it would be consistent across
>> all the conflicts scenarios (if not, we would get messages during the
>> resolution or when the replay started waiting, depending of the conflict
>> scenario).
>>
>> On the other hand, the cons of the new version is that we would miss
>> messages when no resolution is needed (replay wait duration <
>> max_standby_streaming_delay), but is that really annoying? (As no
>> cancellation would occur)
>>
>> Thinking about it, i like the new version (being warned during the
>> resolution) as we would get messages only when cancelation will occur
>> (which is what the user might want to avoid, so the extra info would be
>> useful).
>>
>> What do you think?
> Hmm, I think we print the reason why backends are canceled even of as
> now by ProcessInterrupts(). With this patch and related patches you
> proposed on another thread, the startup process reports virtual xids
> being interrupted, the reason, and LSN of blocked WAL, then processes
> will also report its virtual xid and reason. Therefore, the new
> information added by these patches is only the LSN of blocked WAL.

That's completely right, let's come back to the original intention of
this patch (means, don't wait for the conflict resolution to log messages).

I'll submit a new version once updated.

Thanks!

Bertrand

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-08-28 14:38:56 Re: recovering from "found xmin ... from before relfrozenxid ..."
Previous Message Ranier Vilela 2020-08-28 14:13:28 Re: Please help for error ( file <libxml/parser.h> is required for XML support )