Re: RFC: seccomp-bpf support

From: Andres Freund <andres(at)anarazel(dot)de>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Joshua Brindle <joshua(dot)brindle(at)crunchydata(dot)com>
Subject: Re: RFC: seccomp-bpf support
Date: 2019-08-28 18:07:09
Message-ID: 20190828180709.rtont3eohw3wo6i4@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-08-28 11:13:27 -0400, Joe Conway wrote:
> Recent security best-practices recommend, and certain highly
> security-conscious organizations are beginning to require, that SECCOMP
> be used to the extent possible. The major web browsers, container
> runtime engines, and systemd are all examples of software that already
> support seccomp.

Maybe I'm missing something, but it's not clear to me what meaningful
attack surface can be reduced for PostgreSQL by forbidding certain
syscalls, given the wide variety of syscalls required to run postgres.
That's different from something like a browser's CSS process, or such,
which really doesn't need much beyond some IPC and memory
allocations. But postgres is going to need syscalls as broad as
fork/clone, exec, connect, shm*, etc. I guess you can argue that we'd
still reduce the attack surface for kernel escalations, but that seems
like a pretty small win compared to the cost.

> * With built-in support, it is possible to lock down backend processes
> more tightly than the postmaster.

Which important syscalls would you get away with removing in backends
that postmaster needs? I think the only one - which is a good one though
- that I can think of is listen(). But even that might be too
restrictive for some PLs running out of process.

My main problem with seccomp is that it's *incredibly* fragile,
especially for a program as complex as postgres. We already had seccomp
related bug reports on list, even just due to the very permissive
filtering by some container solutions.

There's regularly new syscalls (e.g. epoll_create1(), and we'll soon get
openat2()), different versions of glibc use different syscalls
(e.g. switching from open() to always using openat()), the system
configuration influences which syscalls are being used (e.g. using
vsyscalls only being used for certain clock sources), and kernel.
bugfixes change the exact set of syscalls being used ([1]).

[1] https://lwn.net/Articles/795128/

Then there's also the issue that many extensions are going to need
additional syscalls.

> Notes on usage:
> ===============
> In order to determine your minimally required allow lists, do something
> like the following on a non-production server with the same architecture
> as production:

> c) Cut and paste the result as the value of session_syscall_allow.

That seems nearly guaranteed to miss a significant fraction of
syscalls. There's just no way we're going to cover all the potential
paths and configurations in our testsuite.

I think if you actually wanted to do something like this, you'd need to
use static analysis to come up with a more reliable list.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-08-28 18:10:45 Re: RFC: seccomp-bpf support
Previous Message Tom Lane 2019-08-28 17:59:40 Re: no mailing list hits in google