Re: Some problems of recovery conflict wait events

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Some problems of recovery conflict wait events
Date: 2020-02-29 03:36:30
Message-ID: CA+fd4k7_f6-yQLiwH0YVKN-J2C1NRbOJxF1LbAZW=kn-98X4=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 26 Feb 2020 at 16:19, Masahiko Sawada
<masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
>
> On Tue, 18 Feb 2020 at 17:58, Masahiko Sawada
> <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> >
> > Hi all,
> >
> > When recovery conflicts happen on the streaming replication standby,
> > the wait event of startup process is null when
> > max_standby_streaming_delay = 0 (to be exact, when the limit time
> > calculated by max_standby_streaming_delay is behind the last WAL data
> > receipt time is behind). Moreover the process title of waiting startup
> > process looks odd in the case of lock conflicts.
> >
> > 1. When max_standby_streaming_delay > 0 and the startup process
> > conflicts with a lock,
> >
> > * wait event
> > backend_type | wait_event_type | wait_event
> > --------------+-----------------+------------
> > startup | Lock | relation
> > (1 row)
> >
> > * ps
> > 42513 ?? Ss 0:00.05 postgres: startup recovering
> > 000000010000000000000003 waiting
> >
> > Looks good.
> >
> > 2. When max_standby_streaming_delay > 0 and the startup process
> > conflicts with a snapshot,
> >
> > * wait event
> > backend_type | wait_event_type | wait_event
> > --------------+-----------------+------------
> > startup | |
> > (1 row)
> >
> > * ps
> > 44299 ?? Ss 0:00.05 postgres: startup recovering
> > 000000010000000000000003 waiting
> >
> > wait_event_type and wait_event are null in spite of waiting for
> > conflict resolution.
> >
> > 3. When max_standby_streaming_delay > 0 and the startup process
> > conflicts with a lock,
> >
> > * wait event
> > backend_type | wait_event_type | wait_event
> > --------------+-----------------+------------
> > startup | |
> > (1 row)
> >
> > * ps
> > 46510 ?? Ss 0:00.05 postgres: startup recovering
> > 000000010000000000000003 waiting waiting
> >
> > wait_event_type and wait_event are null and the process title is
> > wrong; "waiting" appears twice.
> >
> > The cause of the first problem, wait_event_type and wait_event are not
> > set, is that WaitExceedsMaxStandbyDelay which is called by
> > ResolveRecoveryConflictWithVirtualXIDs waits for other transactions
> > using pg_usleep rather than WaitLatch. I think we can change it so
> > that it uses WaitLatch and those caller passes wait event information.
> >
> > For the second problem, wrong process title, the cause is also
> > relevant with ResolveRecoveryConflictWithVirtualXIDs; in case of lock
> > conflicts we add "waiting" to the process title in WaitOnLock but we
> > add it again in ResolveRecoveryConflictWithVirtualXIDs. I think we can
> > have WaitOnLock not set process title in recovery case.
> >
> > This problem exists on 12, 11 and 10. I'll submit the patch.
> >
>
> I've attached patches that fix the above two issues.
>
> 0001 patch fixes the first problem. Currently there are 5 types of
> recovery conflict resolution: snapshot, tablespace, lock, database and
> buffer pin, and we set wait events to only 2 events out of 5: lock
> (only when doing ProcWaitForSignal) and buffer pin. Therefore, users
> cannot know that the startup process is waiting or not, and what
> waiting for. This patch sets wait events to more 3 events: snapshot,
> tablespace and lock. For wait events of those 3 events, I thought that
> we can create a new more appropriate wait event type, say
> RecoveryConflict, and set it for them. However, considering
> back-patching to existing versions, adding new wait event type would
> not be acceptable. So this patch sets existing wait events such as
> PG_WAIT_LOCK to those 3 places and doesn't not set a wait event for
> conflict resolution on dropping database because there is not an
> appropriate existing one. I'll start a separate thread about
> improvement on wait events of recovery conflict resolution for PG13 if
> necessary.

Attached a patch improves wait events of recovery conflict resolution.
It's for PG13. I added new RecoveryConflict wait_event_type and some
wait_event. This patch can be applied on top of two patches I already
proposed.

Regards,

[1] https://www.postgresql.org/message-id/CA%2Bfd4k63ukOtdNx2f-fUZ2vuB3RgE%3DPo%2BxSnpmcPJbKqsJMtiA%40mail.gmail.com

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
0003-Improve-wait-events-of-recovery-conflict-resolution.patch application/octet-stream 7.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2020-02-29 05:07:44 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Previous Message Justin Pryzby 2020-02-29 02:42:02 Re: ALTER tbl rewrite loses CLUSTER ON index