| From: | Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com> |
|---|---|
| To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | [oauth] Stabilize the libpq-oauth ABI (and allow alternative implementations?) |
| Date: | 2025-12-09 21:00:58 |
| Message-ID: | CAOYmi+mrGg+n_X2MOLgeWcj3v_M00gR8uz_D7mM8z=dX1JYVbg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi everybody,
We introduced the libpq-oauth module late in the cycle for PG18, and
to put it mildly, its interface isn't great. The original
implementation depended on libpq internals, and we had to make sure
that we didn't start crashing during major or minor version upgrades.
So there were a bunch of compromises made to keep things safe,
including the restriction that the module name has to change every
major release.
Separately, but closely related: PG18's OAuth support allows you to
customize the client flow via a libpq hook function. Third-party
applications can make use of that, but our own utilities can still
only use the builtin device flow. That's annoying.
I've been working to replace the internal ABI with a stable one,
hopefully to solve both problems at the same time. A dlopen() is a
pretty clear seam for other people to use to modify and extend.
Unfortunately my first attempt (not pictured) ended up in a rabbit
hole, because I tried to tackle the third-party use case first. My
second attempt, attached, focuses on the ABI stabilization instead,
which I think is more likely to succeed.
(This took enough thinking that I'm really glad we didn't try this for
PG18. Thanks for letting me take on some technical debt for a bit.)
= Design =
Here's the train of thought behind the core changes (which are in patch 0004):
The builtin-flow code in fe-auth-oauth.c is similar to the custom-flow
code, but it's just ever-so-slightly different. I'd like to unify the
two, so I want libpq-oauth to make use of the public
PGoauthBearerRequest API, and that means that almost all of the
injections made in the PG18 ABI need to be replaced.
Most of those injections are simply subsumed by the hook API
(hooray!). A couple of others can be replaced by PQconninfo(). Four
are left over:
- pgthreadlock_t
- libpq_gettext
- conn->errorMessage
- conn->oauth_issuer_id
I think we should keep injecting libpq_gettext; no third-party
implementations would be able to use that. And application hooks are
presumably capable of figuring out top-level locking already, since
the application has to have called PQregisterThreadLock() if it needed
to coordinate with libpq.
That leaves error messages and the issuer identifier. I think both
would be useful for hooks to have, too, so I'd like to add them to
PQAUTHDATA_OAUTH_BEARER_TOKEN.
= PQAUTHDATA_OAUTH_BEARER_TOKEN, version 2 =
My original plan for authdata extensions was to add new members to the
end of the structs that libpq passes into the hook. Applications would
gate on a feature macro during compilation to see whether they could
use the new members. That should work fine for an application hook;
you're not allowed to downgrade libpq past the version that your
applications are compiled against, lest you lose symbols (or other
feature-flag functionality) you're relying on.
Plugins, unfortunately, can't rely on the feature macro. As we found
out during the libpq-oauth split [1], we have to handle a long-running
application with an old libpq that loads an upgraded libpq-oauth, even
if the OS package dependencies suggest otherwise. (A plugin
architecture flips the direction of the runtime dependency arrow.)
There are a couple ways we could handle this. For this draft, I've
implemented what I think is a middle ground between verbosity and
type-safety: introduce a new V2 struct that "inherits" the V1 struct
and can be down-cast in the callbacks, kinda similar to our Node
hierarchy. We could go even more verbose, and duplicate the entire
PGoauthBearerToken struct -- matching the callback parameter types for
maximum safety -- but I'm not convinced that this would be a good use
of maintenance effort. The ability to cast between the two means we
can share v1 and v2 implementations in our tests.
We could also just add the new members at the end, say that you're
only allowed to use them if the V2 hook type is in use, and scribble
on them in V1 hooks to try to get misbehaving implementations to crash
outright. This arguably has the same amount of type-safety as the
downcast, and the resulting code looks nicer. (The libcurl API we use
does something similar with curl_version_info().) But it is definitely
more "magic".
Also of note: this adds a PQExpBuffer to libpq-fe.h. Technically, that
type is "internal". But... is it really, though? It doesn't seem
possible for us to make incompatible changes there without crashing
earlier psqls, in which case I would like to make use of it too. Maybe
this deserves its own minithread.
Okay, on to the full patchset.
= Roadmap: Prep =
The first few patches are bugfixes I intend to backpatch to 18.
- 0001: I stomped on the SOCKTYPE name in libpq-fe.h, but that's not
in our namespace and it's conceivable that it might collide with
someone else. (It collided with my own test code during my work on
this.)
- 0002 fixes a copy-paste bug in meson.build, which luckily hadn't
caused problems yet.
- 0003 ports Tom's debug2 fix for Test::Cluster::connect_fails() over
to 002_client.pl. (Currently, log checks in this test aren't made
after connection failures, but I don't really want to chase that down
later after I've forgotten about it.)
= Roadmap: Implementation =
Next three patches are the core implementation, which stabilizes the
ABI for libpq-oauth. I feel fairly confident that this, or something
close to it, could land in PG19.
- 0004 introduces the new PQAUTHDATA_OAUTH_BEARER_TOKEN_V2 API.
As described above.
- 0005 makes use of the new API in libpq-oauth.
This removes more code than it introduces, which is exciting.
Now we can rename libpq-oauth-19.so to just libpq-oauth.so, since we
no longer depend on anything that could change between major versions.
(We still need lock and locale support from libpq, as mentioned above,
so they continue to be decoupled via injection at init time.)
Some of the code in this patch overlaps with the translation fixes
thread [2], which I need to get to first. I'm hoping some additional
simplifications can be made after those localization bugs are fixed.
I think I'd also like to get the threadlock into the module API (but
not the hook API). A third-party flow might need to perform its own
one-time initialization, and if it relies on an old version of Curl
(or worse, Kerberos [3]), it'll need to use the same lock that the
top-level application has registered for libpq. So I imagine I'll need
to break out an initialization function here. Alternatively, I could
introduce an API into libpq to retrieve the threadlock function in
use?
- 0006 removes a potential ABI-compatibility pitfall for future devs.
libpq-oauth needs to use the shared-library variant of libpgcommon,
but it can no longer assume that the encoding support exported by
libpq is compatible. So it must not accidentally link against those
functions (see [4]). I don't imagine anyone will try adding code that
does this in practice; we're pretty UTF8-centric in OAuth. But just to
be safe, define USE_PRIVATE_ENCODING_FUNCS so that anyone who tries
will fail the build.
= Roadmap: Plugins? =
So now we have a stable ABI, which technically means that any
enterprising developer who wants to play games with LD_LIBRARY_PATH
could implement their own version of an OAuth flow, and have our
utilities make use of it into the future.
That brings us to patch 0007, which experimentally promotes the stable
API to a public header, and introduces a really janky environment
variable so that people don't have to play games. It will be obvious
from the code that this is not well-baked yet.
I hope the ability to dlopen() a custom flow can make it for 19; I
just don't know that this envvar approach is any good. The ideal
situation, IMO, is for a flow to be selected in the connection string.
But we have to lock that down, similarly to how we protect
local_preload_libraries, to prevent horrible exploits. At which point
we'll have essentially designed a generic libpq plugin system. Not
necessarily a terrible thing, but I don't think I have time to take it
on for PG19.
WDYT?
--Jacob
[1] https://postgr.es/m/aAkJnDQq3mOUvmQV%40msg.df7cb.de
[2] https://postgr.es/m/TY4PR01MB1690746DB91991D1E9A47F57E94CDA%40TY4PR01MB16907.jpnprd01.prod.outlook.com
[3] https://postgr.es/m/aSSp03wmNMngi/Oe%40ubby
[4] https://postgr.es/c/b6c7cfac8
| Attachment | Content-Type | Size |
|---|---|---|
| 0001-libpq-fe.h-Don-t-claim-SOCKTYPE-in-the-global-namesp.patch | application/octet-stream | 2.9 KB |
| 0002-libpq-oauth-use-correct-c_args-in-meson.build.patch | application/octet-stream | 1.2 KB |
| 0003-oauth_validator-Avoid-races-in-log_check.patch | application/octet-stream | 2.9 KB |
| 0004-libpq-Introduce-PQAUTHDATA_OAUTH_BEARER_TOKEN_V2.patch | application/octet-stream | 25.1 KB |
| 0005-libpq-oauth-Use-the-PGoauthBearerRequestV2-API.patch | application/octet-stream | 44.2 KB |
| 0006-libpq-oauth-Never-link-against-libpq-s-encoding-func.patch | application/octet-stream | 3.0 KB |
| 0007-WIP-Introduce-third-party-OAuth-flow-plugins.patch | application/octet-stream | 36.0 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Thomas Munro | 2025-12-09 21:03:07 | Re: [PATCH] O_CLOEXEC not honored on Windows - handle inheritance chain |
| Previous Message | Heikki Linnakangas | 2025-12-09 21:00:28 | Re: AIX support |