Re: Add Information during standby recovery conflicts

From: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>
To: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add Information during standby recovery conflicts
Date: 2020-08-27 11:58:38
Message-ID: 05621ec8-6e62-25db-14c4-50079304617d@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 8/27/20 10:16 AM, Masahiko Sawada wrote
>
> On Mon, 10 Aug 2020 at 23:43, Drouvot, Bertrand <bdrouvot(at)amazon(dot)com> wrote:
>> Hi,
>>
>> On 7/31/20 7:12 AM, Masahiko Sawada wrote:
>>> + tmpWaitlist = waitlist;
>>> + while (VirtualTransactionIdIsValid(*tmpWaitlist))
>>> + {
>>> + tmpWaitlist++;
>>> + }
>>> +
>>> + num_waitlist_entries = (tmpWaitlist - waitlist);
>>> +
>>> + /* display wal record information */
>>> + if (log_recovery_conflicts &&
>>> (TimestampDifferenceExceeds(recovery_conflicts_log_time,
>>> GetCurrentTimestamp(),
>>> + DeadlockTimeout))) {
>>> + LogBlockedWalRecordInfo(num_waitlist_entries, reason);
>>> + recovery_conflicts_log_time = GetCurrentTimestamp();
>>> + }
>>>
>>> recovery_conflicts_log_time is not initialized. And shouldn't we
>>> compare the current timestamp to the timestamp when the startup
>>> process started waiting?
>>>
>>> I think we should call LogBlockedWalRecordInfo() inside of the inner
>>> while loop rather than at the beginning of
>>> ResolveRecoveryConflictWithVirtualXIDs(). In lock conflict cases, the
>>> startup process waits until 'ltime', then enters
>>> ResolveRecoveryConflictWithVirtualXIDs() after reaching 'ltime'.
>>> Therefore, it makes sense to call LogBlockedWalRecordInfo() at the
>>> beginning of ResolveRecoveryConflictWithVirtualXIDs(). However, in
>>> snapshot and tablespace conflict cases (i.g.
>>> ResolveRecoveryConflictWithSnapshot() and
>>> ResolveRecoveryConflictWithTablespace()), it enters
>>> ResolveRecoveryConflictWithVirtualXIDs() without waits and waits for
>>> reaching ‘ltime’ inside of the inner while look. So the above
>>> condition could always be false.
>> That would make the information being displayed after
>> max_standby_streaming_delay is reached for the multiple cases you just
>> described.
> Sorry, it should be deadlock_timeout, not max_standby_streaming_delay.
> Otherwise, the recovery conflict log message is printed when
> resolution, which seems not to achieve the original purpose. Am I
> missing something?

Ok, I understand where the confusion is coming from.

Indeed the new version is now printing the recovery conflict log message
during the conflict resolution (while the initial intention was to be
warned as soon as the replay had to wait).

The advantage of the new version is that it would be consistent across
all the conflicts scenarios (if not, we would get messages during the
resolution or when the replay started waiting, depending of the conflict
scenario).

On the other hand, the cons of the new version is that we would miss
messages when no resolution is needed (replay wait duration <
max_standby_streaming_delay), but is that really annoying? (As no
cancellation would occur)

Thinking about it, i like the new version (being warned during the
resolution) as we would get messages only when cancelation will occur
(which is what the user might want to avoid, so the extra info would be
useful).

What do you think?

Bertrand

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-08-27 12:12:49 Re: Parallel copy
Previous Message Sachin Khanna 2020-08-27 11:51:09 Please help for error ( file <libxml/parser.h> is required for XML support )