Re: pg_stat_io for the startup process

From: Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, andres(at)anarazel(dot)de, masao(dot)fujii(at)oss(dot)nttdata(dot)com, pgsql-hackers(at)postgresql(dot)org, melanieplageman(at)gmail(dot)com
Subject: Re: pg_stat_io for the startup process
Date: 2023-04-26 18:53:16
Message-ID: CAGPVpCQHbiREfTrR=c4jH-5nVev8D613wv5fp-J+aOJmhZ0-Pg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

Robert Haas <robertmhaas(at)gmail(dot)com>, 26 Nis 2023 Çar, 15:34 tarihinde şunu
yazdı:

> On Wed, Apr 26, 2023 at 5:47 AM Kyotaro Horiguchi
> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > 3. When should we call pgstat_report_stats on the startup process?
> >
> > During recovery, I think we can call pgstat_report_stats() (or a
> > subset of it) right before invoking WaitLatch and at segment
> > boundaries.
>
> I think this kind of idea is worth exploring. Andres mentioned timers,
> but it seems to me that something where we just do it at certain
> "convenient" points might be good enough and simpler. I'd much rather
> have statistics that were up to date as of the last time we finished a
> segment than have nothing at all.
>

I created a rough prototype of a timer-based approach for comparison.
Please see attached.

Registered a new timeout named "STARTUP_STAT_FLUSH_TIMEOUT", The timeout's
handler sets a flag indicating that io stats need to be flushed.
HandleStartupProcInterrupts checks the flag to flush stats.
It's enabled if any WAL record is read (in the main loop of
PerformWalRecovery). And it's disabled before WaitLatch to avoid
unnecessary wakeups if the startup process decides to sleep.

With those changes, I see that startup related rows in pg_stat_io are
updated without waiting for the startup process to exit.

postgres=# select context, reads, extends, hits from pg_stat_io where
> backend_type = 'startup';
> context | reads | extends | hits
> -----------+--------+---------+----------
> bulkread | 0 | | 0
> bulkwrite | 0 | 0 | 0
> normal | 6 | 1 | 41
> vacuum | 0 | 0 | 0
> (4 rows)

I'm not sure this is the correct way to implement this approach though. I
appreciate any comment.

Also; some questions about this implementation if you think it's worth
discussing:
1- I set an arbitrary timeout (1 sec) for testing. I don't know what the
correct value should be. Does it make sense to use
PGSTAT_MIN/MAX/IDLE_INTERVAL instead?
2- I'm also not sure if this timeout should be registered at the beginning
of StartupProcessMain, or does it even make any difference. I tried to do
this just before the main redo loop in PerformWalRecovery, but that made
the CI red.

Best,
--
Melih Mutlu
Microsoft

Attachment Content-Type Size
0001-Add-timeout-to-flush-stats-during-startup-s-main-replay-loop.patch application/octet-stream 5.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2023-04-26 19:10:09 Re: issue with meson builds on msys2
Previous Message Drouvot, Bertrand 2023-04-26 18:36:44 Re: Autogenerate some wait events code and documentation