Re: Extensible Rmgr for Table AMs

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Extensible Rmgr for Table AMs
Date: 2022-04-07 06:15:27
Message-ID: 2b6e4c0298b51a7dc32cd74b104f43f67880aedb.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2022-04-05 at 15:12 +0530, Bharath Rupireddy wrote:
> Yes, I meant this. If we do this, then a global variable
> current_max_rmid can be maintained (by RegisterCustomRmgr) and the
> other functions RmgrStartup, RmgrCleanup and RegisterCustomRmgr can
> avoid for-loops with full range RM_MAX_ID (for (int id = 0; id <
> current_max_rmid; id++)).

There are only 255 entries, so the loops in those functions aren't a
significant overhead.

If we expand it, then we can come up a different solution. We can
change it easily enough later, because the WAL format changes between
releases and so do the APIs for any extension that might be using this.

> With the custom rmgrs, now it's time to have it in the documentation
> (probably not immediately, even after this patch gets in) describing
> what rmgrs are, what are built-in and custom rmgrs and how and when
> an
> extension needs to register, load and use rmgrs etc.

Added a documentation section right after Generic WAL.

> With the above, do you mean to say that only the extensions that load
> their library via shared_preload_libraries are allowed to have custom
> rmgrs? How about extensions using {session, local}_preload_libraries,
> LOAD command? How about extensions that don't load the shared library
> via {session, local, shared}_preload_libraries or LOAD command and
> let
> any of its functions load the shared library on first use?

Correct. The call to RegisterCustomRmgr *must* happen in the postmaster
before any other processes (even the startup process) are launched.
Otherwise recovery wouldn't work. The only place to make that happen is
shared_preload_libraries.

> For instance - extension 1, with id 130 and name 'rmgr_foo' and
> extension 2 with id 131 and name 'Rmgr_foo'/'RMGR_FOO'...., then the
> following code may not catch it right? Do you want rmgr names to be
> case-sensitive/insensitive?

You convinced me. pg_waldump.c uses pg_strcasecmp(), so I'll use it as
well.

> #define RM_N_IDS (UINT8_MAX + 1)
> #define RM_N_BUILTIN_IDS (RM_N_IDS / 2)
> #define RM_N_CUSTOM_IDS (RM_N_IDS / 2)
> #define RM_MAX_ID (RM_N_IDS - 1)
> #define RM_MAX_BUILTIN_ID (RM_NEXT_ID - 1)
> #define RM_MIN_CUSTOM_ID (RM_N_IDS / 2)
> #define RM_MAX_CUSTOM_ID RM_MAX_ID
> #define RMID_IS_BUILTIN(rmid) ((rmid) <= RM_MAX_BUILTIN_ID)
> #define RMID_IS_CUSTOM(rmid) ((rmid) >= RM_MIN_CUSTOM_ID && (rmid) <=
> RM_MAX_CUSTOM_ID)
> #define RMID_IS_VALID(rmid) (RMID_IS_BUILTIN((rmid)) ||
> RMID_IS_CUSTOM((rmid)))

I improved this a bit. But I didn't use your version that's based on
division, that was more confusing to me.

> Yes, an SQL function showing all of the available rmgrs. With the
> custom rmgrs, I think, an SQL function like pg_resource_managers()
> returning a set of {name, id, (if possible -
> loaded_by_extension_name,
> rm_redo, rm_desc, rm_identify ... api names)}

Added.

>
> Some more comments:
>
> 1) Is there any specific reason to have a function to emit a PANIC
> message? And it's being used only in one place in GetRmgr.
> +void
> +RmgrNotFound(RmgrId rmid)

Good call -- better to raise an error, and it can get escalated if it
happens in the wrong place. Done.

> 2) How about "If rm_name is NULL, the corresponding RmgrTable[] entry
> is considered invalid."?
> + * RmgrTable[] is indexed by RmgrId values (see rmgrlist.h). If
> rm_name is
> + * NULL, the structure is considered invalid.

Done.

> 3) How about adding errdetail something like "Custom resource manager
> ID must be between %d and %d (both inclusive)", RM_MAX_BUILTIN_ID,
> RM_MAX_ID)?
> + if (rmid < RM_MIN_CUSTOM_ID)
> + ereport(PANIC, errmsg("custom resource manager ID %d is out of
> range", rmid));

Done.

> 4) How about adding errdetail something like "Custom resource manager
> must have a name"
> + if (rmgr->rm_name == NULL)
> + ereport(PANIC, errmsg("custom resource manager is invalid"));

Improved this message.

> 5) How about "successfully registered custom resource manager"?
> + (errmsg("registered custom resource manager \"%s\" with ID %d",

That's a little much, we don't want to sound too excited that it worked
;-)

> 6) Shouldn't this be ri <= RM_NEXT_ID?
> - for (ri = 0; ri < RM_NEXT_ID; ri++)
> + for (ri = 0; ri < RM_MAX_ID; ri++)
>
> - for (ri = 0; ri < RM_NEXT_ID; ri++)
> + for (ri = 0; ri <= RM_MAX_ID; ri++)

That would leave out the custom rmgrs.

> 7) Unrelated to this patch, why is there always an extra slot
> RM_MAX_ID + 1?

RM_MAX_ID is 255, but we need 256 entries from 0-255 (inclusive).

Good suggestions, thank you for the review!

I'm happy with this patch and I committed it. Changes from last
version:

1. I came up with a better solution (based on a suggestion from
Andres) to handle wal_consistency_checking properly. We can't
understand the custom resource managers when the configuration file is
first loaded, so if we enounter an unknown resource manager, I set a
flag and reprocess it after the shared_preload_libraries are loaded.

2. Added pg_get_wal_resource_managers().

3. Documentation.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-04-07 06:28:19 Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]
Previous Message Peter Smith 2022-04-07 06:13:56 Re: Handle infinite recursion in logical replication setup