Re: BUG #17391: While using --with-ssl=openssl and PG_TEST_EXTRA='ssl' options, SSL tests fail on OpenBSD 7.0

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, byavuz81(at)gmail(dot)com, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>, Michael Paquier <michael(at)paquier(dot)xyz>
Subject: Re: BUG #17391: While using --with-ssl=openssl and PG_TEST_EXTRA='ssl' options, SSL tests fail on OpenBSD 7.0
Date: 2022-02-08 00:30:35
Message-ID: 1113966.1644280235@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I wrote:
> The seeming timing problem with the two CRL tests remains.

I spent some more time poking at this, and found that:

* There are three tests, not two, that intermittently fail.
They are at 001_ssltests.pl lines 565, 608, 618. It's suspicious
that these are exactly the tests that expect to see "sslv3 alert"
or "tlsv1 alert" conditions rather than anything higher-level;
but I don't have any insight as to why that might be relevant.

* The failure occurs on the WRITE side, not the read side; the
'server closed the connection unexpectedly' message we see coming
back from libpq is from pqsecure_raw_write. (I verified this by
changing the texts of the various instances of that message.)

* If I make my_sock_write ignore EPIPE/ECONNRESET, as per the
attached entirely-uncommitable patch, the errors go away.

I hypothesize that something about OpenBSD scheduling is allowing the
server to (sometimes) exit before the client-side openssl has flushed
all its buffers, and the client-side code doesn't handle that well.
It's not very clear why this wouldn't be affecting all users of
OpenSSL, but there you have it.

While the attached is surely no good as a general patch, could we
get away with ignoring EPIPE/ECONNRESET in writes during connection
startup? We'd notice the failure soon enough on the read side if
it's not this problem. (This seems a bit related to libpq's other
hacks that postpone recognition of write failures.)

By the by, today's fairywren failure [1] sure looks related:

# Failed test 'intermediate client certificate is missing: matches'
# at t/001_ssltests.pl line 608.
# 'psql: error: connection to server at "127.0.0.1", port 50577 failed: could not receive data from server: Software caused connection abort (0x00002745/10053)
# SSL SYSCALL error: Software caused connection abort (0x00002745/10053)
# could not send startup packet: No error (0x00000000/0)'
# doesn't match '(?^:SSL error: tlsv1 alert unknown ca)'

This is evidently on the read not write side, so it's not quite
the same thing, but ...

regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2022-02-07%2021%3A04%3A53

Attachment Content-Type Size
ignore-ssl-write-errors.patch text/x-diff 429 bytes

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2022-02-08 01:40:32 BUG #17399: Dead tuple number stats not updated on long running queries
Previous Message Luis Díaz 2022-02-08 00:15:49 PSQL Client command line password leak when using Connection String