Re: Extensible Rmgr for Table AMs

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Extensible Rmgr for Table AMs
Date: 2022-02-04 14:48:01
Message-ID: 20220204144801.wuwcansdzz3w2nn3@jrouhaud
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Fri, Feb 04, 2022 at 09:10:42AM -0500, Robert Haas wrote:
> On Thu, Feb 3, 2022 at 12:34 AM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
> > I agree that having dozen of custom rmgrs doesn't seem likely, but I also have
> > no idea of how much overhead you get by not doing a direct array access. I
> > think it would be informative to benchmark something like simple OLTP write
> > workload on a fast storage (or a ramdisk, or with fsync off...), with the used
> > rmgr being the 1st and the 2nd custom rmgr. Both scenario still seems
> > plausible and shouldn't degenerate on good hardware.
>
> I think it would be hard to measure the overhead of this approach on a
> macrobenchmark.

Yeah that's also my initial thought, but I wouldn't be terribly surprised to be
wrong.

> That having been said, I find this a surprising
> implementation choice. I think that the approaches that are most worth
> considering are:
>
> (1) reallocate the array if needed so that we can continue to just do
> RmgrTable[rmid]
> (2) have one array for builtins and a second array for extensions and
> do rmid < RM_CUSTOM_MIN_ID ? BuiltinRmgrTable[rmid] :
> ExtensionRmgrTable[rmid]
> (3) change RmgrTable to be an array of pointers to structs rather than
> an an array of structs. then the structs don't move around and can be
> const, but the pointers can be moved into a larger array if required
>
> I'm not really sure which is best. My intuition for what will be
> cheapest on modern hardware is pretty shaky. However, I can't see how
> it can be the thing the patch is doing now; a linear search seems like
> it has to be the slowest option.

I guess the idea was to have a compromise between letting rmgr authors choose
arbitrary ids to avoid any conflicts, especially with private implementations,
without wasting too much memory. But those approaches would be pretty much
incompatible with the current definition:

+#define RM_CUSTOM_MIN_ID 128
+#define RM_CUSTOM_MAX_ID UINT8_MAX

even if you only allocate up to the max id found, nothing guarantees that you
won't get a quite high id.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-02-04 14:53:09 Re: Extensible Rmgr for Table AMs
Previous Message Robert Haas 2022-02-04 14:17:54 Re: pg_walfile_name uses XLByteToPrevSeg