Re: postgres_fdw, dblink, and CREATE SUBSCRIPTION security

From: Jacob Champion <jchampion(at)timescale(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: postgres_fdw, dblink, and CREATE SUBSCRIPTION security
Date: 2023-01-25 23:22:02
Message-ID: 0768cedb-695a-8841-5f8b-da2aa64c8f3a@timescale.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/24/23 12:04, Robert Haas wrote:
> I find the concept of "ambient authentication" problematic. I don't
> know exactly what you mean by it. I hope you'll tell me,

Sure: Ambient authority [1] means that something is granted access based
on some aspect of its existence that it can't remove (or even
necessarily enumerate). Up above, when you said "I cannot choose not to
be myself," that's a clear marker that ambient authority is involved.
Examples of ambient authn/z factors might include an originating IP
address, the user ID of a connected peer process, the use of a loopback
interface, a GPS location, and so on. So 'peer' and 'ident' are ambient
authentication methods.

And, because I think it's useful, I'll extend the definition to include
privileges that _could_ be dropped by a proxy, but in practice are
included because there's no easy way not to. Examples for libpq include
the automatic use of the client certificate in ~/.postgresql, or any
Kerberos credentials available in the local user cache. (Or even a
PGPASSWORD set up and forgotten by a DBA.)

Ambient authority is closely related to the confused deputy problem [2],
and the proxy discussed here is a classic confused deputy. The proxy
doesn't know that a core piece of its identity has been used to
authenticate the request it's forwarding. It can't choose its IP
address, or its user ID.

I'm most familiar with this in the context of HTTP, cookie-/IP-based
authn, and cross-site request forgeries. Whenever someone runs a local
web server with no authentication and says "it's okay! we only respond
to requests from the local host!" they're probably about to be broken
open by the first person to successfully reflect a request through the
victim's (very local) web browser.

Ways to mitigate or solve this problem (that I know of) include

1) Forwarding the original ambient context along with the request, so
the server can check it too. HTTP has the Origin header, so a browser
can say, "This request is not coming from my end user; it's coming from
a page controlled by example.org. You can't necessarily treat attached
cookies like they're authoritative." The PROXY protocol lets a proxy
forward several ambient factors, including the originating IP address
(or even the use of a UNIX socket) and information about the original
TLS context.

2) Explicitly combining the request with the proof of authority needed
to make it, as in capability-based security [3]. Some web frameworks
push secret "CSRF tokens" into URLs for this purpose, to tangle the
authorization into the request itself [4]. I'd argue that the "password
requirement" implemented by postgres_fdw and discussed upthread was an
attempt at doing this, to try to ensure that the authentication comes
from the user explicitly and not from the proxy. It's just not very strong.

(require_auth would strengthen it quite a bit; a major feature of that
patchset is to explicitly name the in-band authentication factors that a
server is allowed to pull out of a client. It's still not strong enough
to make a true capability, for one because it's client-side only. But as
long as servers don't perform actions on behalf of users upon
connection, that's pretty good in practice.)

3) Dropping as many implicitly-held privileges as possible before making
a request. This doesn't solve the problem but may considerably reduce
the practical attack surface. For example, if browsers didn't attach
their user's cookies to cross-origin requests, cross-site request
forgeries would probably be considerably less dangerous (and, in the
years since I left the space, it looks like browsers have finally
stopped doing this by default). Upthread, Andres suggested disabling the
default inclusion of client certs and GSS creds, and I would extend that
to include really *anything* pulled in from the environment. Make the
DBA explicitly allow those things.

> but I think
> that I won't like it even after I know, because as I said before, it's
> difficult to know why anyone else makes a decision, and asking an
> untrusted third-party why they're deciding something is sketchy at
> best.

I think that's a red herring. Setting aside that you can, in fact, prove
that the server has authenticated you (e.g. require_auth=scram-sha-256
in my proposed patchset), I don't think "untrusted servers, that we
don't control, doing something stupid" is a very useful thing to focus
on. We're trying to secure the case where a server *is* authenticating
us, using known useful factors, but those factors have been co-opted by
an attacker via a proxy.

> I think that the problems we have in this area can be solved by
> either (a) restricting the open proxy to be less open or (b)
> encouraging people to authenticate users in some way that won't admit
> connections from an open proxy.

(a) is an excellent mitigation, and we should do it. (b) starts getting
shaky because I think peer auth is actually a very reasonable choice for
many people. So I hope we can also start solving the underlying problem
while we implement (a).

> we
> cannot actually prevent people from shooting themselves in the foot
> except, perhaps, by massively nerfing the capabilities of the system.

But I thought we already agreed that most DBAs do not want a massively
capable proxy? I don't think we have to massively nerf the system, but
let's say we did. Would that really be unacceptable for this use case?

(You're still driving hard down the "it's impossible for us to securely
handle both cases at the same time" path. I don't think that's true from
a technical standpoint, because we hold nearly total control of the
protocol. I think we're in a much easier situation than HTTP was.)

> What I was thinking about in terms of a "reverse pg_hba.conf" was
> something in the vein of, e.g.:
>
> SOURCE_COMPONENT SOURCE_DATABASE SOURCE_USER DESTINATION_SUBNET
> DESTINATION_DATABASE DESTINATION_USER OPTIONS ACTION
>
> e.g.
>
> all all all local all all - deny # block access through UNIX sockets
> all all all 127.0.0.0/8 all all - deny # block loopback interface via IPv4
>
> Or:
>
> postgres_fdw all all all all all authentication=cleartext,md5,sasl
> allow # allow postgres_fdw with password-ish authentication

I think this style focuses on absolute configuration flexibility at the
expense of usability. It obfuscates the common use cases. (I have the
exact same complaint about our HBA and ident configs, so I may be
fighting uphill.)

How should a DBA decide what is correct, or audit a configuration they
inherited from someone else? What makes it obvious why a proxy should
require cleartext auth instead of peer auth (especially since peer auth
seems to be inherently better, until you've read this thread)?

I'd rather the configuration focus on the pieces of a proxy's identity
that can be assumed by a client. For example, if the config has an
option for "let a client steal the proxy's user ID", and it's off by
default, then we've given the problem a name. DBAs can educate
themselves on it.

And if that option is off, then the implementation knows that

1) If the client has supplied explicit credentials and we can force the
server to use them, we're safe.
2) If the DBA says they're not running an ident server, or we can force
the server not to use ident authn, or the DBA pinky-swears that that
server isn't using ident authn, all IP connections are additionally safe.
3) If we have a way to forward the client's "origin" and we know that
the server will pay attention to it, all UNIX socket connections are
additionally safe.
4) Any *future* authentication method we add later needs to be
restricted in the same way.

Should we allow the use of our default client cert? the Kerberos cache?
passwords from the environment? All these are named and off by default.
DBAs can look through those options and say "oh, yeah, that seems like a
really bad idea because we have this one server over here..." And we
(the experts) now get to make the best decisions we can, based on a
DBA's declared intent, so the implementation gets to improve over time.
> Disallowing loopback connections feels quite tricky. You could use
> 127.anything.anything.anything, but you could also loop back via IPv6,
> or you could loop back via any interface. But you can't use
> subnet-based ACLs to rule out loop backs through IP/IPv6 interfaces
> unless you know what all your system's own IPs are. Maybe that's an
> argument in favor of having a dedicated deny-loopback facility built
> into the system instead of relying on IP ACLs. But I am not sure that
> really works either: how sure are we that we can discover all of the
> local IP addresses?

Well, to follow you down that road a little bit, I think that a DBA that
has set up `samehost ... trust` in their HBA is going to expect a
corresponding concept here, and it seems important for us to use an
identical implementation of samehost and samenet.

But I don't really want to follow you down that road, because I think
you illustrated my point yourself. You're already thinking about making
Disallowing Loopback Connections a first-class concept, but then you
immediately said

> Maybe it doesn't matter anyway, since the point is
> just to disallow anything that would be likely to use "trust" or
> "ident" authentication

I'd rather we enshrine that -- the point -- in the configuration, and
have the proxy disable everything that can't provably meet that intent.

Thanks,
--Jacob

[1] https://en.wikipedia.org/wiki/Ambient_authority
[2] https://en.wikipedia.org/wiki/Confused_deputy_problem
[3] https://en.wikipedia.org/wiki/Capability-based_security
[4] https://www.rfc-editor.org/rfc/rfc6265#section-8.2

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Imseih (AWS), Sami 2023-01-25 23:22:04 [BUG] pg_stat_statements and extended query protocol
Previous Message Michael Paquier 2023-01-25 22:53:25 Re: pgsql: Rename contrib module basic_archive to basic_wal_module