Re: Some problems of recovery conflict wait events

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Some problems of recovery conflict wait events
Date: 2020-03-04 04:13:19
Message-ID: CA+fd4k42mqvEd6J9x0yD4Zpya9nXK0CwSOtMs6ju7edj-da0sw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 4 Mar 2020 at 11:04, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>
>
>
> On 2020/02/29 12:36, Masahiko Sawada wrote:
> > On Wed, 26 Feb 2020 at 16:19, Masahiko Sawada
> > <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> >>
> >> On Tue, 18 Feb 2020 at 17:58, Masahiko Sawada
> >> <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> When recovery conflicts happen on the streaming replication standby,
> >>> the wait event of startup process is null when
> >>> max_standby_streaming_delay = 0 (to be exact, when the limit time
> >>> calculated by max_standby_streaming_delay is behind the last WAL data
> >>> receipt time is behind). Moreover the process title of waiting startup
> >>> process looks odd in the case of lock conflicts.
> >>>
> >>> 1. When max_standby_streaming_delay > 0 and the startup process
> >>> conflicts with a lock,
> >>>
> >>> * wait event
> >>> backend_type | wait_event_type | wait_event
> >>> --------------+-----------------+------------
> >>> startup | Lock | relation
> >>> (1 row)
> >>>
> >>> * ps
> >>> 42513 ?? Ss 0:00.05 postgres: startup recovering
> >>> 000000010000000000000003 waiting
> >>>
> >>> Looks good.
> >>>
> >>> 2. When max_standby_streaming_delay > 0 and the startup process
> >>> conflicts with a snapshot,
> >>>
> >>> * wait event
> >>> backend_type | wait_event_type | wait_event
> >>> --------------+-----------------+------------
> >>> startup | |
> >>> (1 row)
> >>>
> >>> * ps
> >>> 44299 ?? Ss 0:00.05 postgres: startup recovering
> >>> 000000010000000000000003 waiting
> >>>
> >>> wait_event_type and wait_event are null in spite of waiting for
> >>> conflict resolution.
> >>>
> >>> 3. When max_standby_streaming_delay > 0 and the startup process
> >>> conflicts with a lock,
> >>>
> >>> * wait event
> >>> backend_type | wait_event_type | wait_event
> >>> --------------+-----------------+------------
> >>> startup | |
> >>> (1 row)
> >>>
> >>> * ps
> >>> 46510 ?? Ss 0:00.05 postgres: startup recovering
> >>> 000000010000000000000003 waiting waiting
> >>>
> >>> wait_event_type and wait_event are null and the process title is
> >>> wrong; "waiting" appears twice.
> >>>
> >>> The cause of the first problem, wait_event_type and wait_event are not
> >>> set, is that WaitExceedsMaxStandbyDelay which is called by
> >>> ResolveRecoveryConflictWithVirtualXIDs waits for other transactions
> >>> using pg_usleep rather than WaitLatch. I think we can change it so
> >>> that it uses WaitLatch and those caller passes wait event information.
> >>>
> >>> For the second problem, wrong process title, the cause is also
> >>> relevant with ResolveRecoveryConflictWithVirtualXIDs; in case of lock
> >>> conflicts we add "waiting" to the process title in WaitOnLock but we
> >>> add it again in ResolveRecoveryConflictWithVirtualXIDs. I think we can
> >>> have WaitOnLock not set process title in recovery case.
> >>>
> >>> This problem exists on 12, 11 and 10. I'll submit the patch.
> >>>
> >>
> >> I've attached patches that fix the above two issues.
> >>
> >> 0001 patch fixes the first problem. Currently there are 5 types of
> >> recovery conflict resolution: snapshot, tablespace, lock, database and
> >> buffer pin, and we set wait events to only 2 events out of 5: lock
> >> (only when doing ProcWaitForSignal) and buffer pin.
>
> +1 to add those new wait events in the master. But adding them sounds like
> new feature rather than bug fix. So ISTM that it's not be back-patchable...
>

Yeah, so 0001 patch sets existing wait events to recovery conflict
resolution. For instance, it sets (PG_WAIT_LOCK | LOCKTAG_TRANSACTION)
to the recovery conflict on a snapshot. 0003 patch improves these wait
events by adding the new type of wait event such as
WAIT_EVENT_RECOVERY_CONFLICT_SNAPSHOT. Therefore 0001 (and 0002) patch
is the fix for existing versions and 0003 patch is an improvement for
only PG13. Did you mean even 0001 patch doesn't fit for back-patching?

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2020-03-04 04:21:44 Re: logical replication empty transactions
Previous Message Adam Lee 2020-03-04 03:57:19 Re: Add LogicalTapeSetExtend() to logtape.c