Re: Add Information during standby recovery conflicts

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add Information during standby recovery conflicts
Date: 2020-07-07 07:43:27
Message-ID: CA+fd4k5ftD-OZPA2K0gPJxDv12mxAknAM2LVTjMRFbjZCC2FhQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 6 Jul 2020 at 15:42, Drouvot, Bertrand <bdrouvot(at)amazon(dot)com> wrote:
>
>
> On 7/3/20 4:20 AM, Masahiko Sawada wrote:
> > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> >
> >
> >
> > On Wed, 1 Jul 2020 at 21:52, Drouvot, Bertrand <bdrouvot(at)amazon(dot)com> wrote:
> >> Hey,
> >>
> >> On 6/29/20 10:55 AM, Masahiko Sawada wrote:
> >>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> >>>
> >>>
> >>>
> >>> On Thu, 18 Jun 2020 at 16:28, Drouvot, Bertrand <bdrouvot(at)amazon(dot)com> wrote:
> >>>> Hi hackers,
> >>>>
> >>>> I've attached a patch to add information during standby recovery conflicts.
> >>>> The motive behind is to be able to get some information:
> >>>>
> >>>> On the apply side
> >>>> On the blocker(s) side
> >>>>
> >>>> Motivation:
> >>>>
> >>>> When a standby recovery conflict occurs it could be useful to get more information to be able to dive deep on the root cause and find a way to avoid/mitigate new occurrences.
> >>> I think this is a good feature. Like log_lock_waits, it will help the
> >>> users to investigate recovery conflict issues.
> >> exactly, thanks for looking at it.
> >>>> Adding this information would make the investigations easier, it could help answering questions like:
> >>>>
> >>>> On which LSN was the WAL apply blocked?
> >>>> What was the purpose of the bocked WAL record?
> >>>> On which relation (if any) was the blocked WAL record related to?
> >>>> What was the blocker(s) doing?
> >>>> When did the blocker(s) started their queries (if any)?
> >>>> What was the blocker(s) waiting for? on which wait event?
> >>>>
> >>>> Technical context and proposal:
> >>>>
> >>>> There is 2 points in this patch:
> >>>>
> >>>> Add the information about the blocked WAL record. This is done in standby.c (ResolveRecoveryConflictWithVirtualXIDs, ResolveRecoveryConflictWithDatabase, StandbyTimeoutHandler)
> >>> I think we already have the information about the WAL record being
> >>> applied in errcontext.
> >> right, but it won’t be displayed in case log_error_verbosity is set to
> >> terse.
> > Yes. I think in this case errcontext or errdetail is more appropriate
> > for this information in this case. Otherwise, we will end up emitting
> > same WAL record information twice in log_error_verbosity = verbose.
> >
> > For instance, here is the server logs when I tested a recovery
> > conflict on buffer pin but it looks redundant:
> >
> > 2020-07-03 11:01:15.339 JST [60585] LOG: wal record apply is blocked
> > by 1 connection(s), reason: User query might have needed to see row
> > versions that must be removed.
> > 2020-07-03 11:01:15.339 JST [60585] CONTEXT: WAL redo at 0/30025E0
> > for Heap2/CLEAN: remxid 505
> > 2020-07-03 11:01:15.339 JST [60585] LOG: blocked wal record rmgr:
> > Heap2, lsn: 0/030025E0, received at: 2020-07-03 11:01:15.338997+09,
> > desc: CLEAN, relation: rel 1663/12930/16384 fork main blk 0
> > 2020-07-03 11:01:15.339 JST [60585] CONTEXT: WAL redo at 0/30025E0
> > for Heap2/CLEAN: remxid 505
> > 2020-07-03 11:01:15.347 JST [60604] LOG: about to be interrupted:
> > backend_type: client backend, state: active, wait_event: PgSleep,
> > query_start: 2020-07-03 11:01:14.337708+09
> > 2020-07-03 11:01:15.347 JST [60604] STATEMENT: select pg_sleep(30);
> > 2020-07-03 11:01:15.347 JST [60604] ERROR: canceling statement due to
> > conflict with recovery
> > 2020-07-03 11:01:15.347 JST [60604] DETAIL: User query might have
> > needed to see row versions that must be removed.
> > 2020-07-03 11:01:15.347 JST [60604] STATEMENT: select pg_sleep(30);
> >
> > There are the information WAL record three times and the reason for
> > interruption twice.
>
>
> Fully makes sense, the new patch version attached is now producing:
>
> 2020-07-06 06:10:36.022 UTC [14035] LOG: waiting for recovery conflict
> on snapshot

How about adding the subject? that is, "recovery is waiting for
recovery conflict on %s" or "recovery process <pid> is waiting for
conflict on %s".

> 2020-07-06 06:10:36.022 UTC [14035] DETAIL: WAL record received at
> 2020-07-06 06:10:36.021963+00.
> Tablespace/database/relation are 1663/13586/16672, fork is main
> and block is 0.
> There is 1 blocking connection(s).

To follow the existing log message, perhaps this can be something like
"WAL record received at %s, rel %u/%u/%u, fork %s, blkno %u. %d
processes". But I'm not sure the errdetail is the best place to
display the WAL information as I mentioned in the latter part of this
email.

> 2020-07-06 06:10:36.022 UTC [14035] CONTEXT: WAL redo at 0/3A05708 for
> Heap2/CLEAN: remxid 972
>
> It does not provide the pid(s) of the blocking processes as they'll
> appear during the interruption(s).
>
> >> Or did you had in mind to try to avoid using the new
> >> “current_replayed_xlog” in xlog.c?
> > Regarding LogBlockedWalRecordInfo(), it seems to me that the main
> > message of this logging is to let users know both that the process got
> > stuck due to recovery conflict and its reason (lock, database,
> > bufferpin etc). Other information such as the details of blocked WAL,
> > how many processes are blocking seems like additional information. So
> > I think this information would be better to be shown errcontext or
> > errdetails and we don’t need to create a similar feature as we already
> > show the WAL record being replayed in errcontext.
>
>
> I got rid of current_replayed_xlog in the new patch attached and make
> use of the errcontext argument instead.
>
> >
> > Also, this function emits two LOG messages related to each other but
> > I'm concerned that it can be hard for users to associate these
> > separate log lines especially when server logs are written at a high
> > rate. And I think these two log lines don't follow our error message
> > style[1].
> >
> >>> I wonder if we can show the recovery conflict information in the main
> >>> LOG message, the blocker information in errdetail, and use errcontext
> >>> with regard to WAL record information. For example:
> >>>
> >>> LOG: process 500 waiting for recovery conflict on snapshot
> >>> DETAIL: conflicting transition id: 100, 200, 300
> >>> CONTEXT: WAL redo at 0/3001970 for Heap2/CLEAN: remxid 506
> >> Not sure at all if it would be possible to put all the information at
> >> the same time.
> >> For example in case of shared buffer pin conflict, the blocker is
> >> currently known “only” during the RecoveryConflictInterrupt call (so
> >> still not known yet when we can “already” report the blocked LSN
> >> information).
> >> It might also happen that the blocker(s) will never get interrupted (was
> >> not blocking anymore before max_standby_streaming_delay has been
> >> reached): then it would not be possible to display all the information
> >> here (aka when it is interrupted) while we still want to be warned that
> >> the WAL replay has been blocked.
> > I think it's okay with showing different information for different
> > types of recovery conflict. In buffer pin conflict case, I think we
> > can let the user know the process is waiting for recovery conflict on
> > buffer pin in the main message and the WAL record being replayed in
> > errdetails.
> >
> >>>> Outcome Example:
> >>>>
> >>>> 2020-06-15 06:48:23.774 UTC [6971] LOG: wal record apply is blocked by 2 connection(s), reason: User query might have needed to see row versions that must be removed.
> >>>> 2020-06-15 06:48:23.774 UTC [6971] LOG: blocked wal record rmgr: Heap2, lsn: 0/038E2678, received at: 2020-06-15 06:48:23.774098+00, desc: CLEAN, relation: rel 1663/13586/16652 fork main blk 0
> >>>> 2020-06-15 06:48:54.773 UTC [7088] LOG: about to interrupt pid: 7088, backend_type: client backend, state: active, wait_event_type: Timeout, wait_event: PgSleep, query_start: 2020-06-15 06:48:14.87672+00
> >>> I'm concerned that the above information on the process who is about
> >>> to be interrupted is very detailed but I'm not sure it will be helpful
> >>> for the users. If the blocker is waiting on something lock, the
> >>> information should be logged by log_lock_waits.
> >> The blocker could also be in “idle in transaction”, say for example, on
> >> the standby (with hot_standby_feedback set to off):
> >>
> >> standby> begin;
> >> standby> SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
> >> standby> select * from bdt;
> >>
> >> on the master:
> >>
> >> master> update bdt set generate_series = 15;
> >> master> vacuum verbose bdt;
> >>
> >> would produce:
> >>
> >> 2020-07-01 09:18:55.256 UTC [32751] LOG: about to interrupt pid: 32751,
> >> backend_type: client backend, state: idle in transaction,
> >> wait_event_type: Client, wait_event: ClientRead, query_start: 2020-07-01
> >> 09:18:17.390572+00
> >> 2020-07-01 09:18:55.256 UTC [32751] DETAIL: statement: select * from bdt;
> >>
> >> I think those information are useful to have (might get rid of
> >> wait_event_type though: done in the new attached patch).
> > Since the backend cancels query at a convenient time
> > (CHECK_FOR_INTERRUPT()), I'm concerned that the wait event information
> > displayed in that log might not be helpful. For example, even if the
> > blocker is getting stuck on disk I/O (WAIT_EVENT_BUFFILE_READ etc)
> > while holding a lock that the recovery process is waiting for, when
> > the blocker is able to cancel the query it's no longer waiting for
> > disk I/O. Also, regarding displaying the backend type, we already show
> > the backend type by setting %d in log_line_prefix.
> >
> > I still think we should show this additional information (wait event,
> > query start etc) in errdetails even if we want to show this
> > information in the server logs. Perhaps we can improve
> > errdetail_recovery_conflict()?
> >
> > Aside from the above comments from the perspective of high-level
> > design, I think we can split this patch into two patches: logging the
> > recovery process is waiting (adding log_recovery_conficts) and logging
> > the details of blockers who is about to be interrupted. I think it'll
> > make review easy.
>
>
> Ok. Let's keep this thread for the new attached patch that focus on the
> recovery process waiting.

Thank you for updating the patch!

I've tested the latest patch. On recovery conflict on lock and on
bufferpin, if max_standby_streaming_delay is disabled (set to -1), the
logs don't appear even if log_recovery_conflicts is true.

Here is random comments on the code:

+ recovery_conflict_main_message = psprintf("waiting for
recovery conflict on %s",
+
get_procsignal_reason_desc(reason));
:
+ ereport(LOG,
+ (errmsg("%s", recovery_conflict_main_message),
+ errdetail("%s\n" "There is %d blocking
connection(s).", wal_record_detail_str, num_waitlist_entries)));

It's not translation-support-friendly. I think the message "waiting
for recovery conflict on %s" should be surrounded by _(). Or we can
just put it to ereport as follows:

ereport(LOG,
(errmsg("waiting for recovery conflicts on %s",
get_procsignal_reason_desc(reason))
...

---
+ oldcontext = MemoryContextSwitchTo(ErrorContext);
+ econtext = error_context_stack;
+
+ if (XLogRecGetBlockTag(econtext->arg, 0, &rnode, &forknum, &blknum))

I don't think it's a good idea to rely on error_context_stack because
other codes might set another error context before reaching here in
the future.

---
+ if (XLogRecGetBlockTag(econtext->arg, 0, &rnode, &forknum, &blknum))
+ wal_record_detail_str = psprintf("WAL record received
at %s.\nTablespace/database/relation are %u/%u/%u, fork is %s and
block is %u.",
+ receipt_time_str,
rnode.spcNode, rnode.dbNode, rnode.relNode,
+ forkNames[forknum],
+ blknum);

There might be a block tag in block ids other than 0. I'm not sure the
errdetail is the best place where we display WAL information. For
instance, we display both the relation oid and block number depending
on RM as follows:

2020-07-07 15:50:30.733 JST [13344] LOG: waiting for recovery conflict on lock
2020-07-07 15:50:30.733 JST [13344] DETAIL: WAL record received at
2020-07-07 15:50:27.73378+09.
There is 1 blocking connection(s).
2020-07-07 15:50:30.733 JST [13344] CONTEXT: WAL redo at 0/3000028
for Standby/LOCK: xid 506 db 13586 rel 16384

I wonder if we can display the details of redo WAL information by
improving xlog_outdesc() or rm_redo_error_callback() so that it
displays relfilenode, forknum, and block number. What do you think?

---
+ /* display wal record information */
+ if (log_recovery_conflicts)
+ LogBlockedWalRecordInfo(num_waitlist_entries, reason);

I'm concerned that we should log the recovery conflict information
every time when a recovery conflict happens from the perspective of
overheads and the amount of the logs. Can we logs that information
after waiting for deadlock_timeouts secs like log_lock_waits or
waiting for the fixed duration?

---
@@ -609,6 +682,10 @@ StandbyTimeoutHandler(void)
/* forget any pending STANDBY_DEADLOCK_TIMEOUT request */
disable_timeout(STANDBY_DEADLOCK_TIMEOUT, false);

+ /* display wal record information */
+ if (log_recovery_conflicts)
+ LogBlockedWalRecordInfo(-1,
PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
+
SendRecoveryConflictWithBufferPin(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
}

ResolveRecoveryConflictWithBufferPin() which sets a timer to call
StandbyTimeoutHandler() can be called multiple times even if the
recovery is waiting for one buffer pin. I think we should avoid
logging the same contents multiple times.

---
-
+ {
+ {"log_recovery_conflicts", PGC_SUSET, LOGGING_WHAT,
+ gettext_noop("Logs standby recovery conflicts."),
+ NULL
+ },
+ &log_recovery_conflicts,
+ true,
+ NULL, NULL, NULL
+ },

Other logging parameters such as log_lock_waits is false by default. I
think this parameter can also be false by default but is there any
reason to enable it by default?

Since this parameter applies only to the startup process, I think it
should be PGC_SIGHUP.

---
+ /* display wal record information */
+ if (log_recovery_conflicts)
+ LogBlockedWalRecordInfo(CountDBBackends(dbid),
PROCSIG_RECOVERY_CONFLICT_DATABASE);

We log the recovery conflict into the server log but we don't update
the process title to append "waiting". While discussing the process
title update on recovery conflict, I got the review comment[1] that we
don't need to update the process title because no wait occurs when
recovery conflict with database happens. As the comment says, recovery
is canceling the existing processes on the database being removed, but
not waiting for something.

Regards,

[1] https://www.postgresql.org/message-id/84e4ea5f-80ec-9862-d419-c4433b3c2965%40oss.nttdata.com

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2020-07-07 07:58:59 Re: Quick doc patch
Previous Message Amit Langote 2020-07-07 07:28:20 Re: [PATCH] Performance Improvement For Copy From Binary Files