Re: Add Information during standby recovery conflicts

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add Information during standby recovery conflicts
Date: 2020-08-28 05:03:38
Message-ID: CA+fd4k4kqcSxuWpLOESwGsEnKLvdEpEKbspLCD5uf9SsE8bEAw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 27 Aug 2020 at 20:58, Drouvot, Bertrand <bdrouvot(at)amazon(dot)com> wrote:
>
>
> On 8/27/20 10:16 AM, Masahiko Sawada wrote
> >
> > On Mon, 10 Aug 2020 at 23:43, Drouvot, Bertrand <bdrouvot(at)amazon(dot)com> wrote:
> >> Hi,
> >>
> >> On 7/31/20 7:12 AM, Masahiko Sawada wrote:
> >>> + tmpWaitlist = waitlist;
> >>> + while (VirtualTransactionIdIsValid(*tmpWaitlist))
> >>> + {
> >>> + tmpWaitlist++;
> >>> + }
> >>> +
> >>> + num_waitlist_entries = (tmpWaitlist - waitlist);
> >>> +
> >>> + /* display wal record information */
> >>> + if (log_recovery_conflicts &&
> >>> (TimestampDifferenceExceeds(recovery_conflicts_log_time,
> >>> GetCurrentTimestamp(),
> >>> + DeadlockTimeout))) {
> >>> + LogBlockedWalRecordInfo(num_waitlist_entries, reason);
> >>> + recovery_conflicts_log_time = GetCurrentTimestamp();
> >>> + }
> >>>
> >>> recovery_conflicts_log_time is not initialized. And shouldn't we
> >>> compare the current timestamp to the timestamp when the startup
> >>> process started waiting?
> >>>
> >>> I think we should call LogBlockedWalRecordInfo() inside of the inner
> >>> while loop rather than at the beginning of
> >>> ResolveRecoveryConflictWithVirtualXIDs(). In lock conflict cases, the
> >>> startup process waits until 'ltime', then enters
> >>> ResolveRecoveryConflictWithVirtualXIDs() after reaching 'ltime'.
> >>> Therefore, it makes sense to call LogBlockedWalRecordInfo() at the
> >>> beginning of ResolveRecoveryConflictWithVirtualXIDs(). However, in
> >>> snapshot and tablespace conflict cases (i.g.
> >>> ResolveRecoveryConflictWithSnapshot() and
> >>> ResolveRecoveryConflictWithTablespace()), it enters
> >>> ResolveRecoveryConflictWithVirtualXIDs() without waits and waits for
> >>> reaching ‘ltime’ inside of the inner while look. So the above
> >>> condition could always be false.
> >> That would make the information being displayed after
> >> max_standby_streaming_delay is reached for the multiple cases you just
> >> described.
> > Sorry, it should be deadlock_timeout, not max_standby_streaming_delay.
> > Otherwise, the recovery conflict log message is printed when
> > resolution, which seems not to achieve the original purpose. Am I
> > missing something?
>
> Ok, I understand where the confusion is coming from.
>
> Indeed the new version is now printing the recovery conflict log message
> during the conflict resolution (while the initial intention was to be
> warned as soon as the replay had to wait).
>
> The advantage of the new version is that it would be consistent across
> all the conflicts scenarios (if not, we would get messages during the
> resolution or when the replay started waiting, depending of the conflict
> scenario).
>
> On the other hand, the cons of the new version is that we would miss
> messages when no resolution is needed (replay wait duration <
> max_standby_streaming_delay), but is that really annoying? (As no
> cancellation would occur)
>
> Thinking about it, i like the new version (being warned during the
> resolution) as we would get messages only when cancelation will occur
> (which is what the user might want to avoid, so the extra info would be
> useful).
>
> What do you think?

Hmm, I think we print the reason why backends are canceled even of as
now by ProcessInterrupts(). With this patch and related patches you
proposed on another thread, the startup process reports virtual xids
being interrupted, the reason, and LSN of blocked WAL, then processes
will also report its virtual xid and reason. Therefore, the new
information added by these patches is only the LSN of blocked WAL.
Also, the blocked WAL would be unblocked just after the startup
process reports the resolution message. What use cases are you
assuming?

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey M. Borodin 2020-08-28 05:07:45 Re: new heapcheck contrib module
Previous Message Ashutosh Sharma 2020-08-28 03:40:07 Re: Should we replace the checks for access method OID with handler OID?