Re: pg_stat_io for the startup process

From: Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, masao(dot)fujii(at)oss(dot)nttdata(dot)com, pgsql-hackers(at)postgresql(dot)org, melanieplageman(at)gmail(dot)com
Subject: Re: pg_stat_io for the startup process
Date: 2023-05-03 13:11:33
Message-ID: CAGPVpCRUWfq4Wb+a1+uBmqQCGt-HtqgxtB+MfaiC48Cpo-AmcA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Andres Freund <andres(at)anarazel(dot)de>, 27 Nis 2023 Per, 19:27 tarihinde şunu
yazdı:

> Huh - do you have a link to the failure? That's how I would expect it to be
> done.

I think I should have registered it in the beginning of
PerformWalRecovery() and not just before the main redo loop
since HandleStartupProcInterrupts is called before the loop too.
Here's the error log otherise [1]

> #ifdef WAL_DEBUG
> > if (XLOG_DEBUG ||
> > (record->xl_rmid == RM_XACT_ID &&
> trace_recovery_messages <= DEBUG2) ||
> > @@ -3617,6 +3621,13 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr,
> bool randAccess,
> > /* Do background tasks
> that might benefit us later. */
> >
> KnownAssignedTransactionIdsIdleMaintenance();
> >
> > + /*
> > + * Need to disable flush
> timeout to avoid unnecessary
> > + * wakeups. Enable it
> again after a WAL record is read
> > + * in PerformWalRecovery.
> > + */
> > +
> disable_startup_stat_flush_timeout();
> > +
> > (void)
> WaitLatch(&XLogRecoveryCtl->recoveryWakeupLatch,
> >
> WL_LATCH_SET | WL_TIMEOUT |
> >
> WL_EXIT_ON_PM_DEATH,
>
> I think always disabling the timer here isn't quite right - we want to
> wake up
> *once* in WaitForWALToBecomeAvailable(), otherwise we'll not submit pending
> stats before waiting - potentially for a long time - for WAL. One way
> would be
> just explicitly report before the WaitLatch().
>
> Another approach, I think better, would be to not use
> enable_timeout_every(),
> and to explicitly rearm the timer in HandleStartupProcInterrupts(). When
> called from WaitForWALToBecomeAvailable(), we'd not rearm, and instead do
> so
> at the end of WaitForWALToBecomeAvailable(). That way we also wouldn't
> repeatedly fire between calls to HandleStartupProcInterrupts().
>

Attached patch is probably not doing what you asked. IIUC
HandleStartupProcInterrupts should arm the timer but also shouldn't arm it
if it's called from WaitForWALToBecomeAvailable. But the timer should be
armed again at the very end of WaitForWALToBecomeAvailable. I've been
thinking about how to do this properly. Should HandleStartupProcInterrupts
take a parameter to decide whether the timer needs to be armed? Or need to
add an additional global flag to rearm the timer? Any thoughts?

[1]
https://api.cirrus-ci.com/v1/artifact/task/5282291971260416/testrun/build/testrun/recovery/010_logical_decoding_timelines/log/010_logical_decoding_timelines_replica.log

Best,
--
Melih Mutlu
Microsoft

Attachment Content-Type Size
v2-0001-Add-timeout-to-flush-stats-during-startup-s-main-replay-loop.patch application/octet-stream 5.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2023-05-03 13:17:45 Re: Logging parallel worker draught
Previous Message Drouvot, Bertrand 2023-05-03 12:16:54 Re: Add two missing tests in 035_standby_logical_decoding.pl