Re: Extensible Rmgr for Table AMs

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Extensible Rmgr for Table AMs
Date: 2022-04-05 09:42:09
Message-ID: CALj2ACWxMsQfjzWRLb7jarGsnGXJX_PHvBz4tni-P7tDbpO3gQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 5, 2022 at 5:55 AM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> On Mon, 2022-04-04 at 11:15 +0530, Bharath Rupireddy wrote:
> > 1) Why can't rmid be chosen by the extensions in sequential order
> > from
> > (129 till 255), say, to start with a columnar extension can choose
> > 129, another extension can register 130 and so on right?
>
> I'm not sure what you mean by "chosen by the extensions in sequential
> order". If you mean:
>
> (b) that there should be a convention where people reserve IDs on the
> wiki page in sequential order; then that's fine with me

Yes, I meant this. If we do this, then a global variable
current_max_rmid can be maintained (by RegisterCustomRmgr) and the
other functions RmgrStartup, RmgrCleanup and RegisterCustomRmgr can
avoid for-loops with full range RM_MAX_ID (for (int id = 0; id <
current_max_rmid; id++)).

> > 3) Do we need to talk about extensible rmgrs and
> > https://wiki.postgresql.org/wiki/ExtensibleRmgr in documentation
> > somewhere? This will help extension developers to refer when needed.
>
> We don't have user-facing documentation for every extension hook, and I
> don't think it would be a good idea to document this one (at least in
> the user-facing docs). Otherwise, we'd have to document the whole
> resource manager interface, and that seems like way too much of the
> internals.

With the custom rmgrs, now it's time to have it in the documentation
(probably not immediately, even after this patch gets in) describing
what rmgrs are, what are built-in and custom rmgrs and how and when an
extension needs to register, load and use rmgrs etc.

> > 4) Do we need to change this description?
> > <para>
> > <literal>generic</literal>. Onl
> > superusers can change this setting.
>
> Yes, you're right, I updated the docs for pg_waldump and the
> wal_consistency_checking GUC.

+ <para>
+ Extensions may define additional resource managers with modules loaded
+ in <xref linkend="guc-shared-preload-libraries"/>. However, the names
+ of the custom resource managers are not available at the time the

+ errhint("Include the extension module that implements this resource
manager in shared_preload_libraries.")));

With the above, do you mean to say that only the extensions that load
their library via shared_preload_libraries are allowed to have custom
rmgrs? How about extensions using {session, local}_preload_libraries,
LOAD command? How about extensions that don't load the shared library
via {session, local, shared}_preload_libraries or LOAD command and let
any of its functions load the shared library on first use?

> > 5) Do we need to add a PGDLLIMPORT qualifier to RegisterCustomRmgr()
> > (possibly for RmgrStartup, RmgrTable, RmgrCleanup as well?) so that
> > the Windows versions of extensions can use this feature?
>
> That seems to only be required for variables, not functions.

You are right. For instance, the functions DefineCustomXXX are not
using PGDLLIMPORT qualifer, I believe they can be used by extensions
on WINDOWS.

> > 6) Should the below strcmp pg_strcmp? The custom rmgrs can still use
> > rm_name with different cases and below code fail to catch it?
> > + if (!strcmp(existing_rmgr->rm_name, rmgr->rm_name))
> > + ereport(PANIC,
> > + (errmsg("custom rmgr \"%s\" has the same name as builtin rmgr",
> > + existing_rmgr->rm_name)));
>
> There are already a lot of places where users can choose identifiers
> that differ only in uppercase/lowercase. I don't see a reason to make
> an exception here.

I think postgres parses the user-specified strings and converts them
to lowercase unless they are double-quoted, that is why we see strcmp,
not pg_strcasecmp. But, the custom rmgr name isn't parsed by postgres
right?

For instance - extension 1, with id 130 and name 'rmgr_foo' and
extension 2 with id 131 and name 'Rmgr_foo'/'RMGR_FOO'...., then the
following code may not catch it right? Do you want rmgr names to be
case-sensitive/insensitive?

+ if (!strcmp(RmgrTable[i].rm_name, rmgr->rm_name))
+ ereport(PANIC,
+ (errmsg("failed to register custom resource manager \"%s\" with ID
%d", rmgr->rm_name, rmid),
+ errdetail("Existing resource manager with ID %d has the same name.", i)));

> > 8) Per https://wiki.postgresql.org/wiki/ExtensibleRmgr, 128 is
> > reserved, can we have it as a macro? Or something like
> > (RM_MAX_ID/2+1)
>
> It's already defined as RM_EXPERIMENTAL_ID. Is that what you meant or
> did I misunderstand?

+#define RM_MAX_ID UINT8_MAX
+#define RM_MAX_BUILTIN_ID (RM_NEXT_ID - 1)
+#define RM_MIN_CUSTOM_ID 128
+#define RM_N_CUSTOM_IDS (RM_MAX_ID - RM_MIN_CUSTOM_ID + 1)
+#define RMID_IS_BUILTIN(rmid) ((rmid) <= RM_MAX_BUILTIN_ID)
+#define RMID_IS_CUSTOM(rmid) ((rmid) >= RM_MIN_CUSTOM_ID && (rmid) < RM_MAX_ID)
+#define RMID_IS_VALID(rmid) (RMID_IS_BUILTIN((rmid)) || RMID_IS_CUSTOM((rmid)))

To me the above seems complex, why can't we just have the following,
it's okay even if we don't use some of the macros in the *.c files?
total rmgrs = 256
built in = 128 (index 0 - 127)
custom = 128 (index 128 - 255)

#define RM_N_IDS (UINT8_MAX + 1)
#define RM_N_BUILTIN_IDS (RM_N_IDS / 2)
#define RM_N_CUSTOM_IDS (RM_N_IDS / 2)
#define RM_MAX_ID (RM_N_IDS - 1)
#define RM_MAX_BUILTIN_ID (RM_NEXT_ID - 1)
#define RM_MIN_CUSTOM_ID (RM_N_IDS / 2)
#define RM_MAX_CUSTOM_ID RM_MAX_ID
#define RMID_IS_BUILTIN(rmid) ((rmid) <= RM_MAX_BUILTIN_ID)
#define RMID_IS_CUSTOM(rmid) ((rmid) >= RM_MIN_CUSTOM_ID && (rmid) <=
RM_MAX_CUSTOM_ID)
#define RMID_IS_VALID(rmid) (RMID_IS_BUILTIN((rmid)) || RMID_IS_CUSTOM((rmid)))

If okay, you can specify in a comment that out of UINT8_MAX + 1 rmgrs,
first half rmgr ids are supposed to be reserved for built in rmgrs and
second half rmgr ids for extensions.

> I don't see the point in defining it as RM_MAX_ID/2+1.
>
> > 9) Thinking if there's a way to test the code, maybe exposing
> > RegisterCustomRmgr as a function? I think we need to have a dummy
> > extension that will be used for testing this kind of patches/features
> > but that's a separate discussion IMO.
>
> It's already exposed as a C function in xlog_internal.h.
>
> You mean as a SQL function? I don't think that makes sense. It can only
> be called while processing shared_preload_libraries (on server startup)
> anyway.

Yes, an SQL function showing all of the available rmgrs. With the
custom rmgrs, I think, an SQL function like pg_resource_managers()
returning a set of {name, id, (if possible - loaded_by_extension_name,
rm_redo, rm_desc, rm_identify ... api names)}

Some more comments:

1) Is there any specific reason to have a function to emit a PANIC
message? And it's being used only in one place in GetRmgr.
+void
+RmgrNotFound(RmgrId rmid)

2) How about "If rm_name is NULL, the corresponding RmgrTable[] entry
is considered invalid."?
+ * RmgrTable[] is indexed by RmgrId values (see rmgrlist.h). If rm_name is
+ * NULL, the structure is considered invalid.

3) How about adding errdetail something like "Custom resource manager
ID must be between %d and %d (both inclusive)", RM_MAX_BUILTIN_ID,
RM_MAX_ID)?
+ if (rmid < RM_MIN_CUSTOM_ID)
+ ereport(PANIC, errmsg("custom resource manager ID %d is out of range", rmid));

4) How about adding errdetail something like "Custom resource manager
must have a name"
+ if (rmgr->rm_name == NULL)
+ ereport(PANIC, errmsg("custom resource manager is invalid"));

5) How about "successfully registered custom resource manager"?
+ (errmsg("registered custom resource manager \"%s\" with ID %d",

6) Shouldn't this be ri <= RM_NEXT_ID?
- for (ri = 0; ri < RM_NEXT_ID; ri++)
+ for (ri = 0; ri < RM_MAX_ID; ri++)

- for (ri = 0; ri < RM_NEXT_ID; ri++)
+ for (ri = 0; ri <= RM_MAX_ID; ri++)

7) Unrelated to this patch, why is there always an extra slot RM_MAX_ID + 1?

-const RmgrDescData RmgrDescTable[RM_MAX_ID + 1] = {
+static const RmgrDescData RmgrDescTable[RM_MAX_BUILTIN_ID + 1] = {
#include "access/rmgrlist.h"

-const RmgrData RmgrTable[RM_MAX_ID + 1] = {
+RmgrData RmgrTable[RM_MAX_ID + 1] = {

Regards,
Bharath Rupireddy.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-04-05 09:45:44 Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Previous Message Aleksander Alekseev 2022-04-05 09:38:48 Re: Unit tests for SLRU