Re: [PATCH] More docs on what to do and not do in extension code

From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] More docs on what to do and not do in extension code
Date: 2021-05-30 11:19:55
Message-ID: 07de8fed255f31aa2dfda8efbe839df463f143c6.camel@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2021-01-18 at 15:56 +0800, Craig Ringer wrote:
> The attached patch expands the xfunc docs and bgworker docs a little, providing a starting point for developers
> to learn how to do some common tasks the Postgres Way.

I like these changes!

Here is a review:

+ <para>
+ See <xref linkend="xfunc-shared-addin"/> for information on how to
+ request extension shared memory allocations, <literal>LWLock</literal>s,
+ etc.
+ </para>

This doesn't sound very English to me, and I don't see the point in
repeating parts of the enumeration. How about

See ... for detailed information how to allocate these resources.

+ <para>
+ If a background worker needs to sleep or wait for activity it should

Missing comma after "activity".

+ always use <link linkend="xfunc-sleeping-interrupts-blocking">PostgreSQL's
+ interupt-aware APIs</link> for the purpose. Do not <function>usleep()</function>,
+ <function>system()</function>, make blocking system calls, etc.
+ </para>

"system" is not a verb. Suggestion:

Do not use <function>usleep()</function>, <function>system()</function>
or other blocking system calls.

+ <para>
+ The <filename>src/test/modules/worker_spi</filename> and
+ <filename>src/test/modules/test_shm_mq</filename> contain working examples
+ that demonstrates some useful techniques.
</para>

That is missing a noun in my opinion, I would prefer:

The modules ... contain working examples ...

+ <sect1 id="bgworker-signals" xreflabel="Signal Handling in Background Workers">
+ <title>Signal Handling in Background Workers</title>

It is not a good idea to start a section in the middle of a documentation page.
That will lead to a weird TOC entry at the top of the page.

The better way to do this is to write a short introductory header and convert
most of the first half of the page into another section, so that we end up
with a page the has the introductory material and two TOC entries for the details.

+ The default signal handlers installed for background workers <emphasis>do
+ not interrupt queries or blocking calls into other postgres code</emphasis>

<productname>PostgreSQL</productname>, not "postgres".
Also, there is a comma missing at the end of the line.

+ so they are only suitable for simple background workers that frequently and
+ predictably return control to their main loop. If your worker uses the
+ default background worker signal handling it should call

Another missing comma after "handling".

+ <function>HandleMainLoopInterrupts()</function> on each pass through its
+ main loop.
+ </para>
+
+ <para>
+ Background workers that may make blocking calls into core PostgreSQL code
+ and/or run user-supplied queries should generally replace the default bgworker

Please stick with "background worker", "bgworker" is too sloppy IMO.

+ signal handlers with the handlers used for normal user backends. This will
+ ensure that the worker will respond in a timely manner to a termination
+ request, query cancel request, recovery conflict interrupt, deadlock detector
+ request, etc. To install these handlers, before unblocking interrupts
+ run:

The following would be more grammatical:

To install these handlers, run the following before unblocking interrupts:

+ Then ensure that your main loop and any other code that could run for some
+ time contains <function>CHECK_FOR_INTERRUPTS();</function> calls. A
+ background worker using these signal handlers must use <link
+ linkend="xfunc-resource-management">PostgreSQL's resource management APIs
+ and callbacks</link> to handle cleanup rather than relying on control
+ returning to the main loop because the signal handlers may call

There should be a comma before "because".

+ <function>proc_exit()</function> directly. This is recommended practice
+ for all types of extension code anyway.
+ </para>
+
+ <para>
+ See the comments in <filename>src/include/miscadmin.h</filename> in the
+ postgres headers for more details on signal handling.
+ </para>

"in the postgres headers" is redundant - at any rate, it should be "PostgreSQL".

+ Do not attempt to use C++ exceptions or Windows Structured Exception
+ Handling, and never call <function>exit()</function> directly.

I am alright with this addition, but I think it would be good to link to
<xref linkend="extend-cpp"/> from it.

+ <listitem id="xfunc-threading">
+ <para>
+ Individual PostgreSQL backends are single-threaded.
+ Never call any PostgreSQL function or access any PostgreSQL-managed data
+ structure from a thread other than the main

"PostgreSQL" should always have the <productname> tag.
This applies to a lot of places in this patch.

+ thread. If at all possible your extension should not start any threads

Comma after "possible".

+ and should only use the main thread. PostgreSQL generally uses subprocesses

Hm. If the extension does not start threads, it automatically uses the main thread.
I think that should be removed for clarity.

+ that coordinate over shared memory instead of threads - see
+ <xref linkend="bgworker"/>.

It also uses signals and light-weight locks - but I think that you don't need to
describe the coordination mechanisms here, which are explained in the link you added.

+ primitives like <function>WaitEventSetWait</function> where necessary. Any
+ potentially long-running loop should periodically call <function>
+ CHECK_FOR_INTERRUPTS()</function> to give PostgreSQL a chance to interrupt
+ the function in case of a shutdown request, query cancel, etc. This means

Are there other causes than shutdown or query cancellation?
At any rate, I am not fond of enumerations ending with "etc".

+ you should <function>sleep()</function> or <function>usleep()</function>

You mean: "you should *not* use sleep()"

+ for any nontrivial amount of time - use <function>WaitLatch()</function>

"&mdash;" would be better than "-".

+ or its variants instead. For details on interrupt handling see

Comma after "handling".

[...]
+ based cleanup. Your extension function could be terminated mid-execution

... could be terminated *in* mid-execution ...

+ by PostgreSQL if any function that it calls makes a
+ <function>CHECK_FOR_INTERRUPTS()</function> check. It may not

"makes" sound kind of clumsy in my ears.

+ Spinlocks, latches, condition variables, and more. Details on all of these
+ is far outside the scope of this document, and the best reference is
+ the relevant source code.

I don't think we have to add that last sentence. That holds for pretty much
everything in this documentation.

+ <para>
+ Sometimes relation-based state management for extensions is not
+ sufficient to meet their needs. In that case the extension may need to

Better:
Sometimes, relation-based state management is not sufficient to meet the
needs of an extension.

+ use PostgreSQL's shared-memory based inter-process communication
+ features, and should certainly do so instead of inventing its own or
+ trying to use platform level features. An extension may use
+ <link linkend="xfunc-shared-addin">"raw" shared memory requested from
+ the postmaster at startup</link> or higher level features like dynamic
+ shared memory segments (<acronym>DSM</acronym>),
+ dynamic shared areas (<acronym>DSA</acronym>),
+ or shared memory message queues (<acronym>shm_mq</acronym>). Examples
+ of the use of some these features can be found in the
+ <filename>src/test/modules/test_shm_mq/</filename> example extension. Others
+ can be found in various main PostgreSQL backend code.
+ </para>

Instead of the last sentence, I'd prefer
... or other parts of the source code.

+ <listitem id="xfunc-relcache-syscache">
+ <para>
+ Look up system catalogs and table information using the relcache and syscache

How about "table metadata" rather than "table information"?

+ APIs (<function>SearchSysCache...</function>,
+ <function>relation_open()</function>, etc) rather than attempting to run
+ SQL queries to fetch the information. Ensure that your function holds
+ any necessary locks on the target objects. Take care not to make any calls

... holds *the* necessary locks ...

+ that could trigger cache invalidations while still accessing any
+ syscache cache entries, as this can cause subtle bugs. See

Subtle? Perhaps "bugs that are hard to find".

+ <filename>src/backend/utils/cache/syscache.c</filename>,
+ <filename>src/backend/utils/cache/relcache.c</filename>,
+ <filename>src/backend/access/common/relation.c</filename> and their
+ headers for details.
+ </para>
+ </listitem>

Attached is a new version that has my suggested changes, plus a few from
Bharath Rupireddy (I do not agree with many of his suggestions).

Yours,
Laurenz Albe

Attachment Content-Type Size
v2-0002-Expand-docs-on-PostgreSQL-extension-coding.patch text/x-patch 15.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Asif Rehman 2021-05-30 12:38:36 Re: [PATCH] expand the units that pg_size_pretty supports on output
Previous Message Fabien COELHO 2021-05-30 09:09:41 psql - factor out echo code