| From: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
|---|---|
| To: | Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: [oauth] Stabilize the libpq-oauth ABI (and allow alternative implementations?) |
| Date: | 2025-12-12 07:42:25 |
| Message-ID: | 3720B2E1-0B96-4063-8D63-B5AE6AFEA159@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> On Dec 10, 2025, at 05:00, Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com> wrote:
>
> Hi everybody,
>
> We introduced the libpq-oauth module late in the cycle for PG18, and
> to put it mildly, its interface isn't great. The original
> implementation depended on libpq internals, and we had to make sure
> that we didn't start crashing during major or minor version upgrades.
> So there were a bunch of compromises made to keep things safe,
> including the restriction that the module name has to change every
> major release.
>
> Separately, but closely related: PG18's OAuth support allows you to
> customize the client flow via a libpq hook function. Third-party
> applications can make use of that, but our own utilities can still
> only use the builtin device flow. That's annoying.
>
> I've been working to replace the internal ABI with a stable one,
> hopefully to solve both problems at the same time. A dlopen() is a
> pretty clear seam for other people to use to modify and extend.
> Unfortunately my first attempt (not pictured) ended up in a rabbit
> hole, because I tried to tackle the third-party use case first. My
> second attempt, attached, focuses on the ABI stabilization instead,
> which I think is more likely to succeed.
>
> (This took enough thinking that I'm really glad we didn't try this for
> PG18. Thanks for letting me take on some technical debt for a bit.)
>
> = Design =
>
> Here's the train of thought behind the core changes (which are in patch 0004):
>
> The builtin-flow code in fe-auth-oauth.c is similar to the custom-flow
> code, but it's just ever-so-slightly different. I'd like to unify the
> two, so I want libpq-oauth to make use of the public
> PGoauthBearerRequest API, and that means that almost all of the
> injections made in the PG18 ABI need to be replaced.
>
> Most of those injections are simply subsumed by the hook API
> (hooray!). A couple of others can be replaced by PQconninfo(). Four
> are left over:
> - pgthreadlock_t
> - libpq_gettext
> - conn->errorMessage
> - conn->oauth_issuer_id
>
> I think we should keep injecting libpq_gettext; no third-party
> implementations would be able to use that. And application hooks are
> presumably capable of figuring out top-level locking already, since
> the application has to have called PQregisterThreadLock() if it needed
> to coordinate with libpq.
>
> That leaves error messages and the issuer identifier. I think both
> would be useful for hooks to have, too, so I'd like to add them to
> PQAUTHDATA_OAUTH_BEARER_TOKEN.
>
> = PQAUTHDATA_OAUTH_BEARER_TOKEN, version 2 =
>
> My original plan for authdata extensions was to add new members to the
> end of the structs that libpq passes into the hook. Applications would
> gate on a feature macro during compilation to see whether they could
> use the new members. That should work fine for an application hook;
> you're not allowed to downgrade libpq past the version that your
> applications are compiled against, lest you lose symbols (or other
> feature-flag functionality) you're relying on.
>
> Plugins, unfortunately, can't rely on the feature macro. As we found
> out during the libpq-oauth split [1], we have to handle a long-running
> application with an old libpq that loads an upgraded libpq-oauth, even
> if the OS package dependencies suggest otherwise. (A plugin
> architecture flips the direction of the runtime dependency arrow.)
>
> There are a couple ways we could handle this. For this draft, I've
> implemented what I think is a middle ground between verbosity and
> type-safety: introduce a new V2 struct that "inherits" the V1 struct
> and can be down-cast in the callbacks, kinda similar to our Node
> hierarchy. We could go even more verbose, and duplicate the entire
> PGoauthBearerToken struct -- matching the callback parameter types for
> maximum safety -- but I'm not convinced that this would be a good use
> of maintenance effort. The ability to cast between the two means we
> can share v1 and v2 implementations in our tests.
>
> We could also just add the new members at the end, say that you're
> only allowed to use them if the V2 hook type is in use, and scribble
> on them in V1 hooks to try to get misbehaving implementations to crash
> outright. This arguably has the same amount of type-safety as the
> downcast, and the resulting code looks nicer. (The libcurl API we use
> does something similar with curl_version_info().) But it is definitely
> more "magic".
>
> Also of note: this adds a PQExpBuffer to libpq-fe.h. Technically, that
> type is "internal". But... is it really, though? It doesn't seem
> possible for us to make incompatible changes there without crashing
> earlier psqls, in which case I would like to make use of it too. Maybe
> this deserves its own minithread.
>
> Okay, on to the full patchset.
>
> = Roadmap: Prep =
>
> The first few patches are bugfixes I intend to backpatch to 18.
>
> - 0001: I stomped on the SOCKTYPE name in libpq-fe.h, but that's not
> in our namespace and it's conceivable that it might collide with
> someone else. (It collided with my own test code during my work on
> this.)
> - 0002 fixes a copy-paste bug in meson.build, which luckily hadn't
> caused problems yet.
> - 0003 ports Tom's debug2 fix for Test::Cluster::connect_fails() over
> to 002_client.pl. (Currently, log checks in this test aren't made
> after connection failures, but I don't really want to chase that down
> later after I've forgotten about it.)
>
> = Roadmap: Implementation =
>
> Next three patches are the core implementation, which stabilizes the
> ABI for libpq-oauth. I feel fairly confident that this, or something
> close to it, could land in PG19.
>
> - 0004 introduces the new PQAUTHDATA_OAUTH_BEARER_TOKEN_V2 API.
>
> As described above.
>
> - 0005 makes use of the new API in libpq-oauth.
>
> This removes more code than it introduces, which is exciting.
>
> Now we can rename libpq-oauth-19.so to just libpq-oauth.so, since we
> no longer depend on anything that could change between major versions.
> (We still need lock and locale support from libpq, as mentioned above,
> so they continue to be decoupled via injection at init time.)
>
> Some of the code in this patch overlaps with the translation fixes
> thread [2], which I need to get to first. I'm hoping some additional
> simplifications can be made after those localization bugs are fixed.
>
> I think I'd also like to get the threadlock into the module API (but
> not the hook API). A third-party flow might need to perform its own
> one-time initialization, and if it relies on an old version of Curl
> (or worse, Kerberos [3]), it'll need to use the same lock that the
> top-level application has registered for libpq. So I imagine I'll need
> to break out an initialization function here. Alternatively, I could
> introduce an API into libpq to retrieve the threadlock function in
> use?
>
> - 0006 removes a potential ABI-compatibility pitfall for future devs.
>
> libpq-oauth needs to use the shared-library variant of libpgcommon,
> but it can no longer assume that the encoding support exported by
> libpq is compatible. So it must not accidentally link against those
> functions (see [4]). I don't imagine anyone will try adding code that
> does this in practice; we're pretty UTF8-centric in OAuth. But just to
> be safe, define USE_PRIVATE_ENCODING_FUNCS so that anyone who tries
> will fail the build.
>
> = Roadmap: Plugins? =
>
> So now we have a stable ABI, which technically means that any
> enterprising developer who wants to play games with LD_LIBRARY_PATH
> could implement their own version of an OAuth flow, and have our
> utilities make use of it into the future.
>
> That brings us to patch 0007, which experimentally promotes the stable
> API to a public header, and introduces a really janky environment
> variable so that people don't have to play games. It will be obvious
> from the code that this is not well-baked yet.
>
> I hope the ability to dlopen() a custom flow can make it for 19; I
> just don't know that this envvar approach is any good. The ideal
> situation, IMO, is for a flow to be selected in the connection string.
> But we have to lock that down, similarly to how we protect
> local_preload_libraries, to prevent horrible exploits. At which point
> we'll have essentially designed a generic libpq plugin system. Not
> necessarily a terrible thing, but I don't think I have time to take it
> on for PG19.
>
> WDYT?
> --Jacob
>
> [1] https://postgr.es/m/aAkJnDQq3mOUvmQV%40msg.df7cb.de
> [2] https://postgr.es/m/TY4PR01MB1690746DB91991D1E9A47F57E94CDA%40TY4PR01MB16907.jpnprd01.prod.outlook.com
> [3] https://postgr.es/m/aSSp03wmNMngi/Oe%40ubby
> [4] https://postgr.es/c/b6c7cfac8
> <0001-libpq-fe.h-Don-t-claim-SOCKTYPE-in-the-global-namesp.patch><0002-libpq-oauth-use-correct-c_args-in-meson.build.patch><0003-oauth_validator-Avoid-races-in-log_check.patch><0004-libpq-Introduce-PQAUTHDATA_OAUTH_BEARER_TOKEN_V2.patch><0005-libpq-oauth-Use-the-PGoauthBearerRequestV2-API.patch><0006-libpq-oauth-Never-link-against-libpq-s-encoding-func.patch><0007-WIP-Introduce-third-party-OAuth-flow-plugins.patch>
Hi Jacob,
This is a solid patch set. Only a few small comments:
1 - 0001
```
/* for PGoauthBearerRequest.async() */
#ifdef _WIN32
-#define SOCKTYPE uintptr_t /* avoids depending on winsock2.h for SOCKET */
+#define PQ_SOCKTYPE uintptr_t /* avoids depending on winsock2.h for SOCKET */
#else
-#define SOCKTYPE int
+#define PQ_SOCKTYPE int
#endif
```
The commit message has explained why SOCKTYPE is temporary and the reason why adding prefix “PG_” is to avoid collisions. But I don’t think code readers will always read commit messages, given the macro is a local and temporary, why adding a prefix starting with a underscore, like “_PQ_SOCKTYPE”, which explicitly indicates the macro is kinda private.
===
0002 & 0003 Looks good.
===
2 - 0004
```
+ * Helper for handling user flow failures. If the implementation put anything
+ * into request->error, it's added to conn->errorMessage here.
```
Typo: put -> puts
3 - 0004
```
+# Make sure the v1 hook continues to work. */
+test(
```
“*/“ in the end of the comment line seems a typo.
4 - 0005
```
+ PQAUTHDATA_OAUTH_BEARER_TOKEN, /* server requests an OAuth Bearer token
+ * (prefer v2, below, instead) */
```
"(prefer v2, below, instead)" looks confusing to me, though I can understand what it means. Maybe make it clearer, like “(v2 is preferred; see below)"
===
0006 Looks good.
===
===
Not reviewing 0007 as it marks with WIP.
===
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | jian he | 2025-12-12 07:54:55 | Re: alter check constraint enforceability |
| Previous Message | Zsolt Parragi | 2025-12-12 07:10:24 | Re: Proposal: Add a callback data parameter to GetNamedDSMSegment |