From: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: background worker shudown (was Re: [HACKERS] Why does logical replication launcher exit with exit code 1?) |
Date: | 2018-10-10 03:59:29 |
Message-ID: | CAEepm=10MtmKeDc1WxBM0PQM9OgtNy+RCeWqz40pZRRS3PNo5Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Oct 10, 2018 at 7:29 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Oct 4, 2018 at 7:37 PM Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> > 0. The default SIGTERM handler for bgworkers is bgworker_die(), which
> > generates a FATAL ereport "terminating background worker \"%s\" due to
> > administrator command", directly in the signal handler (hmm, see also
> > recent discussions about the legality of similar code in quickdie()).
>
> Yeah, I think that code is bad news, for the same reasons discussed
> with regard to quickdie().
So I suppose we should just remove it, with something like 0002. I'm
a bit uneasy about existing code out there that might be not calling
CFI. OTOH I suspect that a lot of code copied worker_spi.c and
installed its own handler.
> > 1. Some processes install their own custom SIGTERM handler that sets
> > a flag, and use that to break out of their main loop and exit quietly.
> > Example: autovacuum.c, or the open-source pg_cron extension, and
> > probably many other things that just want a nice quiet shutdown.
> >
> > 2. Some processes install the standard SIGTERM handler die(), and
> > then use CHECK_FOR_INTERRUPTS() to break out of their main loop. By
> > default this looks like "FATAL: terminating connection due to
> > administrator command". This approach is effectively required if the
> > main loop will reach other code that has a CHECK_FOR_INTERRUPTS() wait
> > loop. For example, a "launcher"-type process that calls
> > WaitForBackgroundWorkerStartup() could hang forever if it used
> > approach 1 (ask me how I know).
>
> My experience with background workers has been that approach #1 does
> not really work. I mean, if you aren't doing anything complicated it
> might be OK, if for example you are executing SQL queries, and you try
> to do #1, then your SQL queries don't respond to interrupts. And that
> sucks. So I have generally adopted approach #2.
Maybe src/test/modules/worker_spi.c shouldn't use approach #1 (even if
it might technically be OK for that code)? I think I might have been
guilty of copying that.
> To put that another way, nearly everything can reach
> CHECK_FOR_INTERRUPTS(), so saying that this is required whenever that
> in the case isn't much different from saying that it is required, full
> stop.
> > 3. Some system processes generally use approach 2, but have a special
> > case in ProcessInterrupts() to suppress or alter the usual FATAL
> > message or behaviour. This includes the logical replication launcher.
>
> Maybe we could replace this by a general-purpose hook. So instead of
> the various tests for process types that are there now, we would just
> have
>
> if (procdie_hook != NULL)
> (*procdie_hook)();
>
> And that hook can do whatever you like (but probably including dying).
Ok, patch 0001 is a sketch like that, for discussion.
--
Thomas Munro
http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
0001-Add-proc_die_hook-to-customize-die-interrupt-handlin.patch | application/octet-stream | 3.1 KB |
0002-Remove-async-signal-unsafe-bgworker_die-function.patch | application/octet-stream | 1.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2018-10-10 04:01:50 | Re: partition tree inspection functions |
Previous Message | Amit Langote | 2018-10-10 02:54:48 | Re: partition tree inspection functions |