Re: [PATCH] More docs on what to do and not do in extension code

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] More docs on what to do and not do in extension code
Date: 2021-01-19 14:33:43
Message-ID: CALj2ACVcfNR9KOgNidR0g1M-d9r+xCKDRXBsOgjD8DENa1hKDg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 18, 2021 at 1:27 PM Craig Ringer
<craig(dot)ringer(at)enterprisedb(dot)com> wrote:
> Hi folks
>
> The attached patch expands the xfunc docs and bgworker docs a little, providing a starting point for developers to learn how to do some common tasks the Postgres Way.
>
> It mentions in brief these topics:
>
> * longjmp() based exception handling with elog(ERROR), PG_CATCH() and PG_RE_THROW() etc
> * Latches, spinlocks, LWLocks, heavyweight locks, condition variables
> * shm, DSM, DSA, shm_mq
> * syscache, relcache, relation_open(), invalidations
> * deferred signal handling, CHECK_FOR_INTERRUPTS()
> * Resource cleanup hooks and callbacks like on_exit, before_shmem_exit, the resowner callbacks, etc
> * signal handling in bgworkers
>
> All very superficial, but all things I really wish I'd known a little about, or even that I needed to learn about, when I started working on postgres.
>
> I'm not sure it's in quite the right place. I wonder if there should be a separate part of xfunc.sgml that covers the slightly more advanced bits of postgres backend and function coding like this, lists relevant README files in the source tree, etc.
>
> I avoided going into details like how resource owners work. I don't want the docs to have to cover all that in detail; what I hope to do is start providing people with clear references to the right place in the code, READMEs, etc to look when they need to understand specific topics.

Thanks for the patch.

Here are some comments:

[1]
background worker's main function, and must be unblocked by it; this is to
allow the process to customize its signal handlers, if necessary.
- Signals can be unblocked in the new process by calling
- <function>BackgroundWorkerUnblockSignals</function> and blocked by calling
- <function>BackgroundWorkerBlockSignals</function>.
+ It is important that all background workers set up and unblock signal
+ handling before they enter their main loops. Signal handling in background
+ workers is discussed separately in <xref linkend="bgworker-signals"/>.
</para>

IMO, we can retain the statement about BackgroundWorkerUnblockSignals
and BackgroundWorkerBlockSignals, but mention the link to
"bgworker-signals" for more details and move the statement "it's
important to unblock signals before enter their main loop" to
"bgworker-signals" section and we can also reason there the
consequences if not done.

[2]
+ interupt-aware APIs</link> for the purpose. Do not
<function>usleep()</function>,
+ <function>system()</function>, make blocking system calls, etc.
+ </para>

Is it "Do not use <function>usleep()</function>,
<function>system()</function> or make blocking system calls etc." ?

[3] IMO, we can remove following from "bgworker-signals" if we retain
it where currently it is, as discussed in [1].
+ Signals can be unblocked in the new process by calling
+ <function>BackgroundWorkerUnblockSignals</function> and blocked by calling
+ <function>BackgroundWorkerBlockSignals</function>.

[4] Can we say
+ The default signal handlers set up for background workers <emphasis>do
+ default background worker signal handlers, it should call

instead of
+ The default signal handlers installed for background workers <emphasis>do
+ default background worker signal handling it should call

[5] IMO, we can have something like below
+ request, etc. Set up these handlers before unblocking signals as
+ shown below:

instead of
+ request, etc. To install these handlers, before unblocking interrupts
+ run:

[6] I think logs and errors either elog() or ereport can be used, so how about
+ Use <function>elog()</function> or <function>ereport()</function> for
+ logging output or raising errors instead of any direct stdio calls.

instead of
+ Use <function>elog()</function> and <function>ereport()</function> for
+ logging output and raising errors instead of any direct stdio calls.

[7] Can we use child processes instead of subprocess ? If okay in
other places in the patch as well.
+ and should only use the main thread. PostgreSQL generally
uses child processes
+ that coordinate over shared memory instead of threads - for
instance, see
+ <xref linkend="bgworker"/>.

instead of
+ and should only use the main thread. PostgreSQL generally
uses subprocesses
+ that coordinate over shared memory instead of threads - see
+ <xref linkend="bgworker"/>.

[8] Why should file descriptor manager API be used to execute
subprocesses/child processes?
+ To execute subprocesses and open files, use the routines provided by
+ the file descriptor manager like <function>BasicOpenFile</function>
+ and <function>OpenPipeStream</function> instead of a direct

[9] "should always be"? even if it's a blocking extesion, does it
work? If our intention is to recommend the developers, maybe we should
avoid using the term "should" in the patch in other places as well.
+ Extension code should always be structured as a non-blocking

[10] I think it is
+ you should avoid using <function>sleep()</function> or
<function>usleep()</function>

instead of
+ you should <function>sleep()</function> or
<function>usleep()</function>

[11] I think it is
+ block if this happens. So cleanup of resources is not
entirely managed by PostgreSQL, it
+ must be handled using appropriate callbacks provided by PostgreSQL

instead of
+ block if this happens. So all cleanup of resources not already
+ managed by the PostgreSQL runtime must be handled using appropriate

[12] I think it is
+ found in corresponding PostgreSQL header and source files.

instead of
+ found in the PostgreSQL headers and sources.

[13] I think it is
+ Use PostgreSQL runtime concurrency and synchronisation primitives

+ between the PostgreSQL processes. These include signals and
ProcSignal multiplexed

instead of
+ Use the PostgreSQL runtime's concurrency and synchronisation primitives

+ between PostgreSQL processes. These include signals and
ProcSignal multiplexed

[14] Is it "relation/table based state management"?
+ Sometimes relation-based state management for extensions is not

[15] I think it is
+ use PostgreSQL shared-memory based inter-process communication

instead of
+ use PostgreSQL's shared-memory based inter-process communication

[16] I think it is
+ or shared memory message queues (<acronym>shm_mq</acronym>). Examples
+ usage of some of these features can be found in the
+ <filename>src/test/modules/test_shm_mq/</filename> sample
extension. Others

instead of
+ or shared memory message queues (<acronym>shm_mq</acronym>). Examples
+ of the use of some these features can be found in the
+ <filename>src/test/modules/test_shm_mq/</filename> example
extension. Others

[17] I think it is
+ syscache entries, as this can cause subtle bugs. See

instead of
+ syscache cache entries, as this can cause subtle bugs. See

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2021-01-19 14:45:57 Re: ResourceOwner refactoring
Previous Message Dmitry Dolgov 2021-01-19 14:25:14 Re: [HACKERS] [PATCH] Generic type subscripting