Re: shared-memory based stats collector

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: gkokolatos(at)protonmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: shared-memory based stats collector
Date: 2021-03-09 14:24:10
Message-ID: 894e5991-f7b5-43b5-aece-7175221e60ca@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021/03/09 16:51, Kyotaro Horiguchi wrote:
> At Sat, 6 Mar 2021 00:32:07 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
>>
>>
>> On 2021/03/05 17:18, Kyotaro Horiguchi wrote:
>>> At Thu, 21 Jan 2021 12:03:48 +0900 (JST), Kyotaro Horiguchi
>>> <horikyota(dot)ntt(at)gmail(dot)com> wrote in
>>>> Commit 960869da08 (database statistics) conflicted with this. Rebased.
>>>>
>>>> I'm concerned about the behavior that pgstat_update_connstats calls
>>>> GetCurrentTimestamp() every time stats update happens (with intervals
>>>> of 10s-60s in this patch). But I didn't change that design since that
>>>> happens with about 0.5s intervals in master and the rate is largely
>>>> reduced in this patch, to make this patch simpler.
>>> I stepped on my foot, and another commit coflicted. Just rebased.
>>
>> Thanks for rebasing the patches!
>>
>> I think that 0003 patch is self-contained and useful, for example
>> which
>> enables us to monitor archiver process in pg_stat_activity. So IMO
>> it's worth pusing 0003 patch firstly.
>
> I'm not sure archiver process is worth realtime monitoring, but I
> agree that the patch makes it possible. Anyway it is requried in this
> patchset and I'm happy to see it committed beforehand.
>
> Thanks for the review.
>
>> Here are the review comments for 0003 patch.
>>
>> + /* Archiver process's latch */
>> + Latch *archiverLatch;
>> + /* Current shared estimate of appropriate spins_per_delay value */
>>
>> The last line in the above seems not necessary.
>
> Oops. It seems like a garbage after a past rebasing. Removed.
>
>> In proc.h, NUM_AUXILIARY_PROCS needs to be incremented.
>
> Right. Increased to 5 and rewrote the comment.
>
>> /* ----------
>> * Functions called from postmaster
>> * ----------
>> */
>> extern int pgarch_start(void);
>>
>> In pgarch.h, the above is not necessary.
>
> Removed.
>
>> +extern void XLogArchiveWakeup(void);
>>
>> This seems no longer necessary.
>>
>> +extern void XLogArchiveWakeupStart(void);
>> +extern void XLogArchiveWakeupEnd(void);
>> +extern void XLogArchiveWakeup(void);
>>
>> These seem also no longer necessary.
>
> Sorry for many garbages. Removed all of them.
>
>> PgArchPID = 0;
>> if (!EXIT_STATUS_0(exitstatus))
>> - LogChildExit(LOG, _("archiver process"),
>> - pid, exitstatus);
>> - if (PgArchStartupAllowed())
>> - PgArchPID = pgarch_start();
>> + HandleChildCrash(pid, exitstatus,
>> + _("archiver process"));
>>
>> I don't think that we should treat non-zero exit condition as a crash,
>> as before. Otherwise when archive_command fails on a signal,
>> archiver emits FATAL error and which leads the server restart.
>
> Sounds reasonable. Now archiver is treated the same way to wal
> receiver. Specifically exit(1) doesn't cause server restart.

Thanks!

- if (PgArchStartupAllowed())
- PgArchPID = pgarch_start();

In the latest patch, why did you remove the code to restart new archiver
in reaper()? When archiver dies, I think new archiver should be restarted like
the current reaper() does. Otherwise, the restart of archiver can be
delayed until the next cycle of ServerLoop, which may take time.

>
>> - * walwriter, autovacuum, or background worker.
>> + * walwriter, autovacuum, archiver or background worker.
>> *
>> * The objectives here are to clean up our local state about the child
>> * process, and to signal all other remaining children to quickdie.
>> @@ -3609,6 +3606,18 @@ HandleChildCrash(int pid, int exitstatus, const
>> char *procname)
>> signal_child(AutoVacPID, (SendStop ? SIGSTOP : SIGQUIT));
>> }
>> + /* Take care of the archiver too */
>> + if (pid == PgArchPID)
>> + PgArchPID = 0;
>> + else if (PgArchPID != 0 && take_action)
>> + {
>> + ereport(DEBUG2,
>> + (errmsg_internal("sending %s to process %d",
>> + (SendStop ? "SIGSTOP" : "SIGQUIT"),
>> + (int) PgArchPID)));
>> + signal_child(PgArchPID, (SendStop ? SIGSTOP : SIGQUIT));
>> + }
>> +
>>
>> Same as above.
>
> Mmm. In the first place, I found that I forgot to remove existing
> code to handle archiver... Removed it instead of the above, which is
> added by this patch. Since the process becomes an auxiliary process,
> no reason to differentiate from other auxiliary processes in handling?

Yes, ok.

>
>> In xlogarchive.c, "#include "storage/pmsignal.h"" is no longer
>> necessary.
>
> Removed.
>
>> pgarch_forkexec() should be removed from pgarch.c because it's no
>> longer used.
>
> Right. Removed. EXEC_BACKEND still fails for another reason of
> prototype mismatch of PgArchiverMain. Fixed it toghether.
>
>> /* ------------------------------------------------------------
>> * Public functions called from postmaster follow
>> * ------------------------------------------------------------
>> */
>>
>> The definition of PgArchiverMain() should be placed just
>> after the above comment.
>
> The module no longer have a function called from postmaster. Now
> PgArchiverMain() is placed just below "/* Main entry point...".
>
>> exit(0) in PgArchiverMain() should be proc_exit(0)?
>
> Yeah, proc_exit() says as it should be the only function to call
> exit() directly.
>
> By the way, the patch (0003) removes the flag "wakened". The flag was
> originally added to prevent spurious wakeups (66ec2db7284). At the
> time the pg_ysleep in pgarch_MainLoop was replaced with WaitLatch by
> 89fd72cbf26, the flag survived but with losing its effect since
> WaitLatch doesn't get spurious wakeups (AFAICS). So if the change
> (removal of "wakened") is correct, it might worth another patch.
>
> I'll send new patchset soon.

I read v50_003 patch.

When archiver dies, ProcGlobal->archiverLatch should be reset to NULL,
like walreceiver does the similar thing in WalRcvDie()?

In pgarch.c, #include "postmaster/fork_process.h" seems no longer necessary.

+ if (strcmp(argv[1], "--forkarch") == 0)
+ {
+ /* Restore basic shared memory pointers */
+ InitShmemAccess(UsedShmemSegAddr);
+
+ /* Need a PGPROC to run CreateSharedMemoryAndSemaphores */
+ InitAuxiliaryProcess();
+
+ /* Attach process to shared data structures */
+ CreateSharedMemoryAndSemaphores();
+
+ PgArchiverMain(); /* does not return */
+ }

Why is this necessary? I was thinking that "--forkboot" handles archiver
in SubPostmasterMain().

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2021-03-09 14:27:11 Re: Nicer error when connecting to standby with hot_standby=off
Previous Message James Coleman 2021-03-09 14:19:37 Re: Nicer error when connecting to standby with hot_standby=off