Re: Some problems of recovery conflict wait events

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Some problems of recovery conflict wait events
Date: 2020-03-30 11:10:01
Message-ID: CA+fd4k51DA2sYvAzQ5wKzrbgtRmLmuDQ0i5F4y0i2dat5+4yaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 27 Mar 2020 at 17:54, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>
>
>
> On 2020/03/04 14:31, Masahiko Sawada wrote:
> > On Wed, 4 Mar 2020 at 13:48, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
> >>
> >>
> >>
> >> On 2020/03/04 13:27, Michael Paquier wrote:
> >>> On Wed, Mar 04, 2020 at 01:13:19PM +0900, Masahiko Sawada wrote:
> >>>> Yeah, so 0001 patch sets existing wait events to recovery conflict
> >>>> resolution. For instance, it sets (PG_WAIT_LOCK | LOCKTAG_TRANSACTION)
> >>>> to the recovery conflict on a snapshot. 0003 patch improves these wait
> >>>> events by adding the new type of wait event such as
> >>>> WAIT_EVENT_RECOVERY_CONFLICT_SNAPSHOT. Therefore 0001 (and 0002) patch
> >>>> is the fix for existing versions and 0003 patch is an improvement for
> >>>> only PG13. Did you mean even 0001 patch doesn't fit for back-patching?
> >>
> >> Yes, it looks like a improvement rather than bug fix.
> >>
> >
> > Okay, understand.
> >
> >>> I got my eyes on this patch set. The full patch set is in my opinion
> >>> just a set of improvements, and not bug fixes, so I would refrain from
> >>> back-backpatching.
> >>
> >> I think that the issue (i.e., "waiting" is reported twice needlessly
> >> in PS display) that 0002 patch tries to fix is a bug. So it should be
> >> fixed even in the back branches.
> >
> > So we need only two patches: one fixes process title issue and another
> > improve wait event. I've attached updated patches.
>
> I started reading v2-0002-Improve-wait-events-for-recovery-conflict-resolut.patch.
>
> - ProcWaitForSignal(PG_WAIT_BUFFER_PIN);
> + ProcWaitForSignal(WAIT_EVENT_RECOVERY_CONFLICT_BUFFER_PIN);
>
> Currently the wait event indicating the wait for buffer pin has already
> been reported. But the above change in the patch changes the name of
> wait event for buffer pin only in the startup process. Is this really useful?
> Isn't the existing wait event for buffer pin enough?
>
> - /* Wait to be signaled by the release of the Relation Lock */
> - ProcWaitForSignal(PG_WAIT_LOCK | locktag.locktag_type);
> + /* Wait to be signaled by the release of the Relation Lock */
> + ProcWaitForSignal(WAIT_EVENT_RECOVERY_CONFLICT_LOCK);
>
> Same as above. Isn't the existing wait event enough?

Yeah, we can use the existing wait events for buffer pin and lock.

>
> - /*
> - * Progressively increase the sleep times, but not to more than 1s, since
> - * pg_usleep isn't interruptible on some platforms.
> - */
> - standbyWait_us *= 2;
> - if (standbyWait_us > 1000000)
> - standbyWait_us = 1000000;
> + WaitLatch(MyLatch,
> + WL_LATCH_SET | WL_POSTMASTER_DEATH | WL_TIMEOUT,
> + STANDBY_WAIT_MS,
> + wait_event_info);
> + ResetLatch(MyLatch);
>
> ResetLatch() should be called before WaitLatch()?

Fixed.

>
> Could you tell me why you dropped the "increase-sleep-times" logic?

I thought we can remove it because WaitLatch is interruptible but my
observation was not correct. The waiting startup process is not
necessarily woken up by signal. I think it's still better to not wait
more than 1 sec even if it's an interruptible wait.

Attached patch fixes the above and introduces only two wait events of
conflict resolution: snapshot and tablespace. I also removed the wait
event of conflict resolution of database since it's unlikely to become
a user-visible and a long sleep as we discussed before.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
recovery_conflict_wait_event_v3.patch application/octet-stream 7.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message k.jamison@fujitsu.com 2020-03-30 11:59:08 RE: [Patch] Optimize dropping of relation buffers using dlist
Previous Message Rajkumar Raghuwanshi 2020-03-30 10:43:47 Re: WIP/PoC for parallel backup