Re: Proposals for making it easier to write correct bgworkers

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposals for making it easier to write correct bgworkers
Date: 2020-09-10 07:40:15
Message-ID: CABUevEy6TY9KjYMDm4=+z1AnDOZL_iroQSBgTROF-QCLasGMcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 10, 2020 at 5:02 AM Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:

> Hi all
>
> As I've gained experience working on background workers, it's become
> increasingly clear that they're a bit too different to normal backends for
> many nontrivial uses.
>

<snip> a lot of proposals I agree with.

PROPOSED GENERALISED WORKER MANAGEMENT
> ----
>
> Finally I'm wondering if there's any interest in generalizing the logical
> rep worker management for other bgworkers. I've done a ton of work with
> worker management and it's something I'm sure I could take on but I don't
> want to write it without knowing there's some chance of acceptance.
>
> The general idea is to provide a way for bgworkers to start up managers
> for pools / sets of workers. They launch them and have a function they can
> call in their mainloop that watches their child worker states, invoking
> callbacks when they fail to launch, launch successfully, exit cleanly after
> finishing their work, or die with an error. Workers are tracked in a shmem
> seg where the start of the seg must be a key struct (akin to how the hash
> API works). We would provide calls to look up a worker shmem struct by key,
> signal a worker by key, wait for a worker to exit (up to timeout), etc.
> Like in the logical rep code, access to the worker registration shmem would
> be controlled by LWLock. The extension code can put whatever it wants in
> the worker shmem entries after the key, including various unions or
> whatever - the worker management API won't care.
>
> This abstracts all the low level mess away from bgworker implementations
> and lets them focus on writing the code they want to run.
>
> I'd probably suggest doing so by extracting the logical rep worker
> management, and making the logical rep code use the generalized worker
> management. So it'd be proven, and have in core users.
>

Yes, there is definitely a lot of interest in this.

It would also be good to somehow generalize away the difference between
static bgworkers and dynamic ones. That's something that really annoyed us
with the work on the "online checksums" patch, and I've also run into that
issue in other cases. I think finding a way to launch a dynamic worker out
of the postmaster would be a way to do that -- I haven't looked into the
detail, but if we're looking at generalizing the worker management this is
definitely something we should include in the consideration.

I haven't looked at the different places we could in theory extract the
management out of and reuse, but it makes sense that the logical
replication one would be the most appropriate since it's the newest one (vs
autovacuum which is the other one that can at least do similar things). And
yes, it definitely makes sense to have a generalized set of code for this,
because it's certainly a fairly complicated pattern that we shouldn't be
re-inventing over and over again with slightly different bugs.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-09-10 07:43:18 Re: Inconsistency in determining the timestamp of the db statfile.
Previous Message Magnus Hagander 2020-09-10 07:33:43 Re: Inconsistency in determining the timestamp of the db statfile.