Re: [PATCH] Reload SSL certificates on SIGHUP

From: Michael Banck <michael(dot)banck(at)credativ(dot)de>
To: Andreas Karlsson <andreas(at)proxel(dot)se>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Reload SSL certificates on SIGHUP
Date: 2016-11-03 14:24:05
Message-ID: 20161103142404.GD15747@nighthawk.caipicrew.dd-dns.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

this is a first review of this patch.

As a feature, I think this functionality is very welcome. Having to
schedule a downtime in order to enable SSL or change the SSL certificate
is a nuisance and it might make admins think twice, reducing security.

The patch applies cleanly (modulo fuzz in the last hunk, but see below)
to master as of 00a86856, it compiles cleanly when using --with-openssl
and without that option and passes make check for both cases.

It does not update the documentation, I think at least section 18.9
"Secure TCP/IP Connections with SSL" needs updating as it says: "The
files server.key, server.crt, root.crt, and root.crl (or their
configured alternative names) are only examined during server start; so
you must restart the server for changes in them to take effect".

I (so far) lightly tested the functionality, and I could not find any
serious logic issues up till now. Changing or replacing the server
certificate with an expired one would make psql no longer connect with
sslmode=require.

However see below for segfaults during testing.

Some code comments on the patch:

On Mon, Oct 31, 2016 at 10:40:18AM +0100, Andreas Karlsson wrote:
> diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
> index 668f217..a1b582f 100644
> --- a/src/backend/libpq/be-secure-openssl.c
> +++ b/src/backend/libpq/be-secure-openssl.c
> @@ -77,12 +77,14 @@ static DH *generate_dh_parameters(int prime_len, int generator);
[...]
> + if ((context = initialize_context()) != NULL)
> {
[...]
> + be_tls_destroy();

I am getting segfaults on reloading the configuration for this call.

|Program received signal SIGSEGV, Segmentation fault.
|X509_VERIFY_PARAM_free (param=0x21) at x509_vpm.c:105
|105 x509_vpm.c: Datei oder Verzeichnis nicht gefunden.
|(gdb) bt full
|#0 X509_VERIFY_PARAM_free (param=0x21) at x509_vpm.c:105
|No locals.
|#1 0x00007f3b59745a09 in SSL_CTX_free (a=0x21) at ssl_lib.c:1955
| i = -1
|#2 0x00000000005ee314 in be_tls_destroy () at be-secure-openssl.c:197
|No locals.
|#3 be_tls_init () at be-secure-openssl.c:181
|No locals.
|#4 0x00000000005e3f55 in secure_initialize () at be-secure.c:74
|No locals.
|#5 0x000000000065f377 in SIGHUP_handler (postgres_signal_arg=<optimized out>) at postmaster.c:2524
| save_errno = 4
| __func__ = "SIGHUP_handler"
|#6 <signal handler called>
|No locals.
|#7 0x00007f3b58938873 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81
|No locals.
|#8 0x000000000046e027 in ServerLoop () at postmaster.c:1672
| timeout = {tv_sec = 54, tv_usec = 568533}
| rmask = {fds_bits = {56, 0 <repeats 15 times>}}
| selres = <optimized out>
| now = <optimized out>
| readmask = {fds_bits = {56, 0 <repeats 15 times>}}
| last_lockfile_recheck_time = 1478169442
| last_touch_time = 1478169442
| __func__ = "ServerLoop"
|#9 0x000000000066263e in PostmasterMain (argc=argc(at)entry=3, argv=argv(at)entry=0x2405df0)
| at postmaster.c:1316
| opt = <optimized out>
| status = <optimized out>
| userDoption = <optimized out>
| listen_addr_saved = <optimized out>
| i = <optimized out>
| output_config_variable = <optimized out>
| __func__ = "PostmasterMain"
|#10 0x000000000046f6b7 in main (argc=3, argv=0x2405df0) at main.c:228
|No locals.

What I'm doing is:

|postgres=# SHOW ssl;
| ssl
|-----
| on
|(1 row)
|
|postgres=# ALTER SYSTEM SET ssl = off;
|ALTER SYSTEM
|postgres=# SELECT pg_reload_conf();
| pg_reload_conf
|----------------
| t
|(1 row)
|
|postgres=# SHOW ssl;
| ssl
|-----
| off
|(1 row)
|
|postgres=# ALTER SYSTEM SET ssl = on;
|ALTER SYSTEM
|postgres=# SELECT pg_reload_conf();
| pg_reload_conf
|----------------
| t
|(1 row)
|
|postgres=# SHOW ssl;
|FATAL: terminating connection due to unexpected postmaster exit
|server closed the connection unexpectedly
| This probably means the server terminated abnormally
| before or while processing the request.
|The connection to the server was lost. Attempting reset: Failed.
|!>

The other one I got:

|Program received signal SIGSEGV, Segmentation fault.
|sk_pop_free (st=0x100007f00000002, func=0x7fdbac018ed0 <ASN1_OBJECT_free>) at stack.c:291
|291 stack.c: Datei oder Verzeichnis nicht gefunden.
|(gdb) bt full
|#0 sk_pop_free (st=0x100007f00000002, func=0x7fdbac018ed0 <ASN1_OBJECT_free>) at stack.c:291
| i = <optimized out>
|#1 0x00007fdbac04324a in x509_verify_param_zero (param=0x10d4b40) at x509_vpm.c:85
|No locals.
|#2 X509_VERIFY_PARAM_free (param=0x10d4b40) at x509_vpm.c:105
|No locals.
|#3 0x00007fdbac33ca09 in SSL_CTX_free (a=0x100007f00000002) at ssl_lib.c:1955
| i = -1
|#4 0x00000000005ee84c in be_tls_destroy () at be-secure-openssl.c:197
|No locals.
|#5 0x00000000005e3f65 in secure_destroy () at be-secure.c:87
|No locals.
|#6 0x000000000065f395 in SIGHUP_handler (postgres_signal_arg=<optimized out>) at postmaster.c:2532
| save_errno = 4
| __func__ = "SIGHUP_handler"
|#7 <signal handler called>
|No locals.
|#8 0x00007fdbab52f873 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81
|No locals.
|#9 0x000000000046e027 in ServerLoop () at postmaster.c:1672
| timeout = {tv_sec = 54, tv_usec = 411921}
| rmask = {fds_bits = {56, 0 <repeats 15 times>}}
| selres = <optimized out>
| now = <optimized out>
| readmask = {fds_bits = {56, 0 <repeats 15 times>}}
| last_lockfile_recheck_time = 1478182425
| last_touch_time = 1478182425
| __func__ = "ServerLoop"
|#10 0x000000000066263e in PostmasterMain (argc=argc(at)entry=4, argv=argv(at)entry=0x108de00)
| at postmaster.c:1316
| opt = <optimized out>
| status = <optimized out>
| userDoption = <optimized out>
| listen_addr_saved = <optimized out>
| i = <optimized out>
| output_config_variable = <optimized out>
| __func__ = "PostmasterMain"
|#11 0x000000000046f6b7 in main (argc=4, argv=0x108de00) at main.c:228
|No locals.

I ran pg_reload_conf() twice after setting ssl to off here:

|postgres=# SHOW ssl;
| ssl
|-----
| on
|(1 row)
|
|postgres=# ALTER SYSTEM SET ssl = off;
|ALTER SYSTEM
|postgres=# SELECT pg_reload_conf();
| pg_reload_conf
|----------------
| t
|(1 row)
|
|postgres=# SHOW ssl;
| ssl
|-----
| off
|(1 row)
|
|postgres=# SELECT pg_reload_conf();
| pg_reload_conf
|----------------
| t
|(1 row)
|
|postgres=# SHOW ssl;
|FATAL: terminating connection due to unexpected postmaster exit
|server closed the connection unexpectedly
| This probably means the server terminated abnormally
| before or while processing the request.
|The connection to the server was lost. Attempting reset: Failed.

I set a breakpoint in be_tls_destroy() and dumped SSL_context in gdb for
both pg_reload_conf() calls and the differences are:

-$17 = {method = 0x7f4faf6bce80 <SSLv23_method_data.16018>, cipher_list = 0x298d3a0,
- cipher_list_by_id = 0x298f140, cert_store = 0x298c4c0, sessions = 0x298c370,
+$18 = {method = 0x7f4fae955b28 <main_arena+1288>, cipher_list = 0x7f4fae955b28 <main_arena+1288>,
+ cipher_list_by_id = 0x298bf40, cert_store = 0x298bf40, sessions = 0x298c370,
[...]
- sess_timeout = 0, sess_cache_full = 0, sess_hit = 0, sess_cb_hit = 0}, references = 1,
+ sess_timeout = 0, sess_cache_full = 0, sess_hit = 0, sess_cb_hit = 0}, references = 0,
[...]
- comp_methods = 0x2969920, info_callback = 0x0, client_CA = 0x298cb80, options = 56098820, mode = 2,
+ comp_methods = 0x0, info_callback = 0x0, client_CA = 0x298cb80, options = 56098820, mode = 2,

Not sure whether this might be an issue with my openssl library (this is
on Debian stable).

> + SSL_context = context;
> + /* Set flag to remember if CA store is successfully loaded */
> + ssl_loaded_verify_locations = !!ssl_ca_file[0];

I am not sure this '!!' operator is according to project policy, it
seems to be not used elsewhere in the codebase except for imported or
auto-generated code. At least there should be a comment here, methinks.

[...]

> +
> + /* set up the allowed cipher list */
> + if (SSL_CTX_set_cipher_list(context, SSLCipherSuites) != 1)
> + INIT_CONTEXT_ERROR((errmsg("could not set the cipher list (no valid ciphers available)")))

Missing semicolon at the end of the line.

[...]

> +static bool
> +initialize_ecdh(SSL_CTX *context)
> {
> #ifndef OPENSSL_NO_ECDH
> EC_KEY *ecdh;
> int nid;
>
> nid = OBJ_sn2nid(SSLECDHCurve);
> - if (!nid)
> - ereport(FATAL,
> + if (!nid) {
> + ereport(LOG,

Opening braces should be on the next line.

> (errmsg("ECDH: unrecognized curve name: %s", SSLECDHCurve)));
> + return false;
> + }
>
> ecdh = EC_KEY_new_by_curve_name(nid);
> - if (!ecdh)
> - ereport(FATAL,
> + if (!ecdh) {
> + ereport(LOG,

Same.

> diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
> index 24add74..c997581 100644
> --- a/src/backend/postmaster/postmaster.c
> +++ b/src/backend/postmaster/postmaster.c
> @@ -241,6 +241,8 @@ bool enable_bonjour = false;
> char *bonjour_name;
> bool restart_after_crash = true;
>
> +static bool LoadedSSL = false;
> +

All the delarations above this one are global variables for GUCs (see
postmaster.h, lines 16-31). So ISTM this static variable declaration
should be introduced by a comment in order to logically seperate it from
the previous ones, like the following static variables are:

> /* PIDs of special child processes; 0 when not running */
> static pid_t StartupPID = 0,
> BgWriterPID = 0,

[...]

> --- a/src/include/libpq/libpq.h
> +++ b/src/include/libpq/libpq.h
> @@ -85,6 +85,7 @@ extern int (*pq_putmessage_hook) (char msgtype, const char *s, size_t len);
> extern int (*pq_flush_hook) (void);
>
> extern int secure_initialize(void);
> +extern void secure_destroy(void);
> extern bool secure_loaded_verify_locations(void);
> extern void secure_destroy(void);
> extern int secure_open_server(Port *port);
>

This hunk baffled me at first because two lines below your added
secure_destroy() declaration, the same line is already present. I did
some digging and it turns out we had a secure_destroy() in the ancient
past, but its implementation got removed in 2008 in 4e8162865 as there
were no (more) users of it, however, the declaration was kept on until
now.

So this hunk should be removed I guess.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael(dot)banck(at)credativ(dot)de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kuntal Ghosh 2016-11-03 14:28:42 Re: WAL consistency check facility
Previous Message Kuntal Ghosh 2016-11-03 14:17:21 Re: WAL consistency check facility