Re: pg_stat_io for the startup process

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, masao(dot)fujii(at)oss(dot)nttdata(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_stat_io for the startup process
Date: 2023-05-08 21:06:38
Message-ID: 20230508210638.qv6k7gui6mnfoxuu@liskov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 03, 2023 at 04:11:33PM +0300, Melih Mutlu wrote:
> Andres Freund <andres(at)anarazel(dot)de>, 27 Nis 2023 Per, 19:27 tarihinde şunu yazdı:
> > #ifdef WAL_DEBUG
> > > if (XLOG_DEBUG ||
> > > (record->xl_rmid == RM_XACT_ID &&
> > trace_recovery_messages <= DEBUG2) ||
> > > @@ -3617,6 +3621,13 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr,
> > bool randAccess,
> > > /* Do background tasks
> > that might benefit us later. */
> > >
> > KnownAssignedTransactionIdsIdleMaintenance();
> > >
> > > + /*
> > > + * Need to disable flush
> > timeout to avoid unnecessary
> > > + * wakeups. Enable it
> > again after a WAL record is read
> > > + * in PerformWalRecovery.
> > > + */
> > > +
> > disable_startup_stat_flush_timeout();
> > > +
> > > (void)
> > WaitLatch(&XLogRecoveryCtl->recoveryWakeupLatch,
> > >
> > WL_LATCH_SET | WL_TIMEOUT |
> > >
> > WL_EXIT_ON_PM_DEATH,
> >
> > I think always disabling the timer here isn't quite right - we want to
> > wake up
> > *once* in WaitForWALToBecomeAvailable(), otherwise we'll not submit pending
> > stats before waiting - potentially for a long time - for WAL. One way
> > would be
> > just explicitly report before the WaitLatch().
> >
> > Another approach, I think better, would be to not use
> > enable_timeout_every(),
> > and to explicitly rearm the timer in HandleStartupProcInterrupts(). When
> > called from WaitForWALToBecomeAvailable(), we'd not rearm, and instead do
> > so
> > at the end of WaitForWALToBecomeAvailable(). That way we also wouldn't
> > repeatedly fire between calls to HandleStartupProcInterrupts().
> >
>
> Attached patch is probably not doing what you asked. IIUC
> HandleStartupProcInterrupts should arm the timer but also shouldn't arm it
> if it's called from WaitForWALToBecomeAvailable. But the timer should be
> armed again at the very end of WaitForWALToBecomeAvailable. I've been
> thinking about how to do this properly. Should HandleStartupProcInterrupts
> take a parameter to decide whether the timer needs to be armed? Or need to
> add an additional global flag to rearm the timer? Any thoughts?

I had the same question about how to determine whether or not to rearm.

> From 9be7360e49db424c45c53e85efe8a4f5359b5244 Mon Sep 17 00:00:00 2001
> From: Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>
> Date: Wed, 26 Apr 2023 18:21:32 +0300
> Subject: [PATCH v2] Add timeout to flush stats during startup's main replay
> loop
> diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
> index efc2580536..842394bc8f 100644
> --- a/src/backend/postmaster/startup.c
> +++ b/src/backend/postmaster/startup.c
> @@ -72,6 +72,9 @@ static TimestampTz startup_progress_phase_start_time;
> */
> static volatile sig_atomic_t startup_progress_timer_expired = false;
>
> +/* Indicates whether flushing stats is needed. */
> +static volatile sig_atomic_t idle_stats_update_pending = false;
> +
> /*
> * Time between progress updates for long-running startup operations.
> */
> @@ -206,6 +209,18 @@ HandleStartupProcInterrupts(void)
> /* Perform logging of memory contexts of this process */
> if (LogMemoryContextPending)
> ProcessLogMemoryContextInterrupt();
> +
> + if (idle_stats_update_pending)
> + {
> + /* It's time to report wal stats. */

If we want dbstats to be updated, we'll probably have to call
pgstat_report_stat() here and fix the timestamp issue Horiguchi-san
points out upthread. Perhaps you could review those changes and consider
adding those as preliminary patches before this in a set.

I think you will then need to handle the issue he mentions with partial
flushes coming from pgstat_report_stat() and remembering to try and
flush stats again in case of a partial flush. Though, perhaps we can
just pass force=true.

> + pgstat_report_wal(true);
> + idle_stats_update_pending = false;
> + }

Good that you thought to check if the timeout was already active.

> + else if (!get_timeout_active(IDLE_STATS_UPDATE_TIMEOUT))
> + {
> + /* Set the next timeout. */
> + enable_idle_stats_update_timeout();
> + }
> }
>
>
> @@ -385,3 +400,22 @@ has_startup_progress_timeout_expired(long *secs, int *usecs)
>
> return true;
> }
> +
> +/* Set a flag indicating that it's time to flush wal stats. */
> +void
> +idle_stats_update_timeout_handler(void)
> +{
> + idle_stats_update_pending = true;

Is WakeupRecovery() needed when the timer goes off and the startup
process is waiting on a latch? Does this happen in
WaitForWalToBecomeAvailable()?

> + WakeupRecovery();
> +}
> +
> +/* Enable the timeout set for wal stat flush. */
> +void
> +enable_idle_stats_update_timeout(void)
> +{
> + TimestampTz fin_time;
> +
> + fin_time = TimestampTzPlusMilliseconds(GetCurrentTimestamp(),
> + PGSTAT_MIN_INTERVAL);

It is a shame we have to end up calling GetCurrentTimestamp() since we
are using enable_timeout_at(). Couldn't we use enable_timeout_after()?

> + enable_timeout_at(IDLE_STATS_UPDATE_TIMEOUT, fin_time);
> +}

- Melanie

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-05-08 21:46:43 Re: pg_stat_io for the startup process
Previous Message Andres Freund 2023-05-08 20:41:09 Re: base backup vs. concurrent truncation