Re: pg_stat_io for the startup process

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, andres(at)anarazel(dot)de, masao(dot)fujii(at)oss(dot)nttdata(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_stat_io for the startup process
Date: 2023-04-27 21:30:40
Message-ID: CAAKRu_YLBELwc7T6F_BFV_oKOzZNx2X+wbRiuR7U4FT-0-caCg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 26, 2023 at 2:53 PM Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com> wrote:
>
> Hi all,
>
> Robert Haas <robertmhaas(at)gmail(dot)com>, 26 Nis 2023 Çar, 15:34 tarihinde şunu yazdı:
>>
>> On Wed, Apr 26, 2023 at 5:47 AM Kyotaro Horiguchi
>> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
>> > 3. When should we call pgstat_report_stats on the startup process?
>> >
>> > During recovery, I think we can call pgstat_report_stats() (or a
>> > subset of it) right before invoking WaitLatch and at segment
>> > boundaries.
>>
>> I think this kind of idea is worth exploring. Andres mentioned timers,
>> but it seems to me that something where we just do it at certain
>> "convenient" points might be good enough and simpler. I'd much rather
>> have statistics that were up to date as of the last time we finished a
>> segment than have nothing at all.
>
>
> I created a rough prototype of a timer-based approach for comparison. Please see attached.
>
> Registered a new timeout named "STARTUP_STAT_FLUSH_TIMEOUT", The timeout's
> handler sets a flag indicating that io stats need to be flushed.
> HandleStartupProcInterrupts checks the flag to flush stats. It's enabled if
> any WAL record is read (in the main loop of PerformWalRecovery). And it's
> disabled before WaitLatch to avoid unnecessary wakeups if the startup process
> decides to sleep.

I was reviewing this and found that if I remove the calls to
disable_startup_stat_flush_timeout() the number of calls to
pgstat_report_wal() on a briefly active and then idle standby are about
8 in 100 seconds whereas with timer disablement, the calls over the same
period are about 40. I would have thought that disabling the timer would
have caused us to try and flush the stats less often.

With the calls to disable_startup_stat_flush_timeout(), we do have much
fewer calls to setitimer(), of course.

Given the below suggestion by Andres, I tried doing some traces of the
various approaches.

> > + disable_startup_stat_flush_timeout();
> > +
> > (void) WaitLatch(&XLogRecoveryCtl->recoveryWakeupLatch,
> > WL_LATCH_SET | WL_TIMEOUT |
> > WL_EXIT_ON_PM_DEATH,
>
> I think always disabling the timer here isn't quite right - we want to wake up
> *once* in WaitForWALToBecomeAvailable(), otherwise we'll not submit pending
> stats before waiting - potentially for a long time - for WAL. One way would be
> just explicitly report before the WaitLatch().
>
> Another approach, I think better, would be to not use enable_timeout_every(),
> and to explicitly rearm the timer in HandleStartupProcInterrupts(). When
> called from WaitForWALToBecomeAvailable(), we'd not rearm, and instead do so
> at the end of WaitForWALToBecomeAvailable(). That way we also wouldn't
> repeatedly fire between calls to HandleStartupProcInterrupts().

After a quick example implementation of this, I found that it seemed to
try and flush the stats less often on an idle standby (good) than using
enable_timeout_every().

- Melanie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2023-04-27 22:14:43 Re: Possible regression setting GUCs on \connect
Previous Message Daniel Gustafsson 2023-04-27 21:25:30 Re: Should vacuum process config file reload more often