Re: Some problems of recovery conflict wait events

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Some problems of recovery conflict wait events
Date: 2020-02-26 07:19:09
Message-ID: CA+fd4k63ukOtdNx2f-fUZ2vuB3RgE=Po+xSnpmcPJbKqsJMtiA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 18 Feb 2020 at 17:58, Masahiko Sawada
<masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
>
> Hi all,
>
> When recovery conflicts happen on the streaming replication standby,
> the wait event of startup process is null when
> max_standby_streaming_delay = 0 (to be exact, when the limit time
> calculated by max_standby_streaming_delay is behind the last WAL data
> receipt time is behind). Moreover the process title of waiting startup
> process looks odd in the case of lock conflicts.
>
> 1. When max_standby_streaming_delay > 0 and the startup process
> conflicts with a lock,
>
> * wait event
> backend_type | wait_event_type | wait_event
> --------------+-----------------+------------
> startup | Lock | relation
> (1 row)
>
> * ps
> 42513 ?? Ss 0:00.05 postgres: startup recovering
> 000000010000000000000003 waiting
>
> Looks good.
>
> 2. When max_standby_streaming_delay > 0 and the startup process
> conflicts with a snapshot,
>
> * wait event
> backend_type | wait_event_type | wait_event
> --------------+-----------------+------------
> startup | |
> (1 row)
>
> * ps
> 44299 ?? Ss 0:00.05 postgres: startup recovering
> 000000010000000000000003 waiting
>
> wait_event_type and wait_event are null in spite of waiting for
> conflict resolution.
>
> 3. When max_standby_streaming_delay > 0 and the startup process
> conflicts with a lock,
>
> * wait event
> backend_type | wait_event_type | wait_event
> --------------+-----------------+------------
> startup | |
> (1 row)
>
> * ps
> 46510 ?? Ss 0:00.05 postgres: startup recovering
> 000000010000000000000003 waiting waiting
>
> wait_event_type and wait_event are null and the process title is
> wrong; "waiting" appears twice.
>
> The cause of the first problem, wait_event_type and wait_event are not
> set, is that WaitExceedsMaxStandbyDelay which is called by
> ResolveRecoveryConflictWithVirtualXIDs waits for other transactions
> using pg_usleep rather than WaitLatch. I think we can change it so
> that it uses WaitLatch and those caller passes wait event information.
>
> For the second problem, wrong process title, the cause is also
> relevant with ResolveRecoveryConflictWithVirtualXIDs; in case of lock
> conflicts we add "waiting" to the process title in WaitOnLock but we
> add it again in ResolveRecoveryConflictWithVirtualXIDs. I think we can
> have WaitOnLock not set process title in recovery case.
>
> This problem exists on 12, 11 and 10. I'll submit the patch.
>

I've attached patches that fix the above two issues.

0001 patch fixes the first problem. Currently there are 5 types of
recovery conflict resolution: snapshot, tablespace, lock, database and
buffer pin, and we set wait events to only 2 events out of 5: lock
(only when doing ProcWaitForSignal) and buffer pin. Therefore, users
cannot know that the startup process is waiting or not, and what
waiting for. This patch sets wait events to more 3 events: snapshot,
tablespace and lock. For wait events of those 3 events, I thought that
we can create a new more appropriate wait event type, say
RecoveryConflict, and set it for them. However, considering
back-patching to existing versions, adding new wait event type would
not be acceptable. So this patch sets existing wait events such as
PG_WAIT_LOCK to those 3 places and doesn't not set a wait event for
conflict resolution on dropping database because there is not an
appropriate existing one. I'll start a separate thread about
improvement on wait events of recovery conflict resolution for PG13 if
necessary.

0002 patch fixes the second problem. With this patch, the process
title is updated properly in all recovery conflict resolution cases.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
0001-Set-wait-events-for-recovery-conflict-resolution.patch application/octet-stream 4.6 KB
0002-Fix-process-title-update-during-recovery-conflicts.patch application/octet-stream 6.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-02-26 07:41:12 Commit fest manager for 2020-03
Previous Message Noah Misch 2020-02-26 05:36:12 Re: [HACKERS] WAL logging problem in 9.4.3?