Re: shared-memory based stats collector

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: masao(dot)fujii(at)oss(dot)nttdata(dot)com
Cc: gkokolatos(at)protonmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: shared-memory based stats collector
Date: 2021-03-10 08:51:37
Message-ID: 20210310.175137.77410390117009989.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Wed, 10 Mar 2021 15:20:43 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
> On 2021/03/10 12:10, Kyotaro Horiguchi wrote:
> > Agreed. The code moved to the original place and added the crash
> > handling code. And I added a phrase to the comment.
> > + * Was it the archiver? If exit status is zero (normal) or one (FATAL
> > + * exit), we assume everything is all right just like normal backends
> > + * and just try to restart a new one so that we immediately retry
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > + * archiving of remaining files. (If fail, we'll try again in future
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>
> "of" of "archiving of remaining" should be replaced with "the", or removed?

Either will do. I don't mind turning the gerund (archiving) into a
gerund phrase (archiving remaining files).

> Just for record. Previously LogChildExit() was called and the following LOG
> message was output when the archiver reported FATAL error. OTOH the patch
> prevents that and the following LOG message is not output at FATAL exit of
> archiver. But I don't think that the following message is required in that
> case
> because FATAL message indicating the similar thing is already output.
> Therefore, I'm ok with the patch.
>
> LOG: archiver process (PID 46418) exited with exit code 1

Yeah, that's the same behavor with wal receiver.

> >> I read v50_003 patch.
> >>
> >> When archiver dies, ProcGlobal->archiverLatch should be reset to NULL,
> >> like walreceiver does the similar thing in WalRcvDie()?
> > Differently from walwriter and checkpointer, archiver as well as
> > walreceiver may die while server is running. Leaving the latch pointer
> > alone may lead to nudging a wrong process that took over the same
> > procarray slot. Added pgarch_die() to do that.
>
> Thanks!
>
> + if (IsUnderPostmaster && ProcGlobal->archiverLatch)
> + SetLatch(ProcGlobal->archiverLatch);
>
> The latch can be reset to NULL in pgarch_die() between the if-condition and
> SetLatch(), and which would be problematic. Probably we should protect
> the access to the latch by using spinlock, like we do for walreceiver's latch?

Ugg. Right. I remember about that bug. I moved the archiverLatch out
of ProcGlobal to a dedicate local struct PgArch and placed a spinlock
together.

Thanks for the review! v52 is attached.

Other than the archiver fix above, A bug of 0004 in handling of
replication slot stats that leads to a hang is fixed. (It's the cause
of CF-bot failure.)

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v52-0001-sequential-scan-for-dshash.patch text/x-patch 9.2 KB
v52-0002-Add-conditional-lock-feature-to-dshash.patch text/x-patch 6.2 KB
v52-0003-Make-archiver-process-an-auxiliary-process.patch text/x-patch 21.0 KB
v52-0004-Shared-memory-based-stats-collector.patch text/x-patch 312.5 KB
v52-0005-Doc-part-of-shared-memory-based-stats-collector.patch text/x-patch 20.2 KB
v52-0006-Remove-the-GUC-stats_temp_directory.patch text/x-patch 13.7 KB
v52-0007-Exclude-pg_stat-directory-from-base-backup.patch text/x-patch 1.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Laurenz Albe 2021-03-10 09:03:24 Re: Procedures versus the "fastpath" API
Previous Message Matthias van de Meent 2021-03-10 08:35:10 Re: Improvements and additions to COPY progress reporting