Re: [PoC] Federated Authn/z with OAUTHBEARER

From: Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, mahendrakar s <mahendrakarforpg(at)gmail(dot)com>, Andrey Chudnovsky <achudnovskij(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "hlinnaka(at)iki(dot)fi" <hlinnaka(at)iki(dot)fi>, "michael(at)paquier(dot)xyz" <michael(at)paquier(dot)xyz>, "smilingsamay(at)gmail(dot)com" <smilingsamay(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: [PoC] Federated Authn/z with OAUTHBEARER
Date: 2024-04-01 22:07:45
Message-ID: CAOYmi+mKUTxCh9vcYnB3+f+6JRjfhAAigV93DsDsWLy=zAR7zg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 28, 2024 at 3:34 PM Daniel Gustafsson <daniel(at)yesql(dot)se> wrote:
> In jsonapi.c, makeJsonLexContextCstringLen initialize a JsonLexContext with
> palloc0 which would need to be ported over to use ALLOC for frontend code.

Seems reasonable (but see below, too).

> On
> that note, the errorhandling in parse_oauth_json() for content-type checks
> attempts to free the JsonLexContext even before it has been created. Here we
> can just return false.

Agreed. They're zero-initialized, so freeJsonLexContext() is safe
IIUC, but it's clearer not to call the free function at all.

But for these additions:

> - makeJsonLexContextCstringLen(&lex, resp->data, resp->len, PG_UTF8, true);
> + if (!makeJsonLexContextCstringLen(&lex, resp->data, resp->len, PG_UTF8, true))
> + {
> + actx_error(actx, "out of memory");
> + return false;
> + }

...since we're using the stack-based API as opposed to the heap-based
API, they shouldn't be possible to hit. Any failures in createStrVal()
are deferred to parse time on purpose.

> - echo 'libpq must not be calling any function which invokes exit'; exit 1; \
> + echo 'libpq must not be calling any function which invokes exit'; \
> The offending codepath in libcurl was in the NTLM_WB module, a very old and
> obscure form of NTLM support which was replaced (yet remained in the tree) a
> long time ago by a full NTLM implementatin. Based on the findings in this
> thread it was deprecated with a removal date set to April 2024 [0]. A bug in
> the 8.4.0 release however disconnected NTLM_WB from the build and given the
> lack of complaints it was decided to leave as is, so we can base our libcurl
> requirements on 8.4.0 while keeping the exit() check intact.

Of the Cirrus machines, it looks like only FreeBSD has a new enough
libcurl for that. Ubuntu won't until 24.04, Debian Bookworm doesn't
have it unless you're running backports, RHEL 9 is still on 7.x... I
think requiring libcurl 8 is effectively saying no one will be able to
use this for a long time. Is there an alternative?

> + else if (strcasecmp(content_type, "application/json") != 0)
> This needs to handle parameters as well since it will now fail if the charset
> parameter is appended (which undoubtedly will be pretty common). The easiest
> way is probably to just verify the mediatype and skip the parameters since we
> know it can only be charset?

Good catch. application/json no longer defines charsets officially
[1], so I think we should be able to just ignore them. The new
strncasecmp needs to handle a spurious prefix, too; I have that on my
TODO list.

> + /* TODO investigate using conn->Pfdebug and CURLOPT_DEBUGFUNCTION here */
> + CHECK_SETOPT(actx, CURLOPT_VERBOSE, 1L, return false);
> + CHECK_SETOPT(actx, CURLOPT_ERRORBUFFER, actx->curl_err, return false);
> CURLOPT_ERRORBUFFER is the old and finicky way of extracting error messages, we
> should absolutely move to using CURLOPT_DEBUGFUNCTION instead.

This new way doesn't do the same thing. Here's a sample error:

connection to server at "127.0.0.1", port 56619 failed: failed to
fetch OpenID discovery document: Weird server reply ( Trying
127.0.0.1:36647...
Connected to localhost (127.0.0.1) port 36647 (#0)
Mark bundle as not supporting multiuse
HTTP 1.0, assume close after body
Invalid Content-Length: value
Closing connection 0
)

IMO that's too much noise. Prior to the change, the same error would have been

connection to server at "127.0.0.1", port 56619 failed: failed to
fetch OpenID discovery document: Weird server reply (Invalid
Content-Length: value)

The error buffer is finicky for sure, but it's also a generic one-line
explanation of what went wrong... Is there an alternative API for that
I'm missing?

> + /* && response_code != 401 TODO */ )
> Why is this marked with a TODO, do you remember?

Yeah -- I have a feeling that 401s coming back are going to need more
helpful hints to the user, since it implies that libpq itself hasn't
authenticated correctly as opposed to some user-related auth failure.
I was hoping to find some sample behaviors in the wild and record
those into the suite.

> + print("# OAuth provider (PID $pid) is listening on port $port\n");
> Code running under Test::More need to use diag() for printing non-test output
> like this.

Ah, thanks.

> +#if LIBCURL_VERSION_MAJOR <= 8 && LIBCURL_VERSION_MINOR < 4

I don't think this catches versions like 7.76, does it? Maybe
`LIBCURL_VERSION_MAJOR < 8 || (LIBCURL_VERSION_MAJOR == 8 &&
LIBCURL_VERSION_MINOR < 4)`, or else `LIBCURL_VERSION_NUM < 0x080400`?

> my $pid = open(my $read_fh, "-|", $ENV{PYTHON}, "t/oauth_server.py")
> - // die "failed to start OAuth server: $!";
> + or die "failed to start OAuth server: $!";
>
> - read($read_fh, $port, 7) // die "failed to read port number: $!";
> + read($read_fh, $port, 7) or die "failed to read port number: $!";

The first hunk here looks good (thanks for the catch!) but I think the
second is not correct behavior. $! doesn't get set unless undef is
returned, if I'm reading the docs correctly. Yay Perl.

> + /* Sanity check the previous operation */
> + if (actx->running != 1)
> + {
> + actx_error(actx, "failed to queue HTTP request");
> + return false;
> + }

`running` can be set to zero on success, too. I'm having trouble
forcing that code path in a test so far, but we're going to have to do
something special in that case.

> Another issue I have is the sheer size and the fact that so much code is
> replaced by subsequent commits, so I took the liberty to squash some of this
> down into something less daunting. The attached v22 retains the 0001 and then
> condenses the rest into two commits for frontent and backend parts.

Looks good.

> I did drop
> the Python pytest patch since I feel that it's unlikely to go in from this
> thread (adding pytest seems worthy of its own thread and discussion), and the
> weight of it makes this seem scarier than it is.

Until its coverage gets ported over, can we keep it as a `DO NOT
MERGE` patch? Otherwise there's not much to run in Cirrus.

> The final patch contains fixes for all of the above review comments as well as
> a some refactoring, smaller clean-ups and TODO fixing. If these fixes are
> accepted I'll incorporate them into the two commits.
>
> Next I intend to work on writing documentation for this.

Awesome, thank you! I will start adding coverage to the new code paths.

--Jacob

[1] https://datatracker.ietf.org/doc/html/rfc7159#section-11

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ants Aasma 2024-04-01 22:09:57 Re: Popcount optimization using AVX512
Previous Message Andrew Dunstan 2024-04-01 22:06:49 Re: Security lessons from liblzma