Re: shared-memory based stats collector

From: Andres Freund <andres(at)anarazel(dot)de>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, michael(at)paquier(dot)xyz, thomas(dot)munro(at)gmail(dot)com, tomas(dot)vondra(at)2ndquadrant(dot)com, a(dot)zakirov(at)postgrespro(dot)ru, ah(at)cybertec(dot)at, magnus(at)hagander(dot)net, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: shared-memory based stats collector
Date: 2020-03-09 18:47:54
Message-ID: 20200309184754.yvrgzqpzs3iynszq@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-03-09 15:37:05 -0300, Alvaro Herrera wrote:
> Tom Lane escribió:
>
> In patch 0003,
>
> > /*
> > - * Was it the archiver? If so, just try to start a new one; no need
> > - * to force reset of the rest of the system. (If fail, we'll try
> > - * again in future cycles of the main loop.). Unless we were waiting
> > - * for it to shut down; don't restart it in that case, and
> > - * PostmasterStateMachine() will advance to the next shutdown step.
> > + * Was it the archiver? Normal exit can be ignored; we'll start a new
> > + * one at the next iteration of the postmaster's main loop, if
> > + * necessary. Any other exit condition is treated as a crash.
> > */
> > if (pid == PgArchPID)
> > {
> > PgArchPID = 0;
> > if (!EXIT_STATUS_0(exitstatus))
> > - LogChildExit(LOG, _("archiver process"),
> > - pid, exitstatus);
> > - if (PgArchStartupAllowed())
> > - PgArchPID = pgarch_start();
> > + HandleChildCrash(pid, exitstatus,
> > + _("archiver process"));
> > continue;
> > }
>
> I'm worried that we're causing all processes to terminate when an
> archiver dies in some ugly way; but in the current coding, it's pretty
> harmless and we'd just start a new one. I think this needs to be
> reconsidered. As far as I know, pgarchiver remains unconnected to
> shared memory so a crash-restart cycle is not necessary. We should
> continue to just log the error message and move on.

Why is it worth having the archiver be "robust" that way? Except that
random implementation details led to it not being connected to shared
memory, and thus allowing a restart for any exit code, I don't see a
need? It doesn't have exit paths that could validly trigger another exit
code, as far as I can see.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-03-09 18:59:35 Re: Bug in pg_restore with EventTrigger in parallel mode
Previous Message Alvaro Herrera 2020-03-09 18:37:05 Re: shared-memory based stats collector