Re: BUG #17391: While using --with-ssl=openssl and PG_TEST_EXTRA='ssl' options, SSL tests fail on OpenBSD 7.0

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Daniel Gustafsson <daniel(at)yesql(dot)se>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, byavuz81(at)gmail(dot)com, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>, Michael Paquier <michael(at)paquier(dot)xyz>
Subject: Re: BUG #17391: While using --with-ssl=openssl and PG_TEST_EXTRA='ssl' options, SSL tests fail on OpenBSD 7.0
Date: 2022-02-11 21:54:54
Message-ID: 2187263.1644616494@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I wrote:
>> Hmm, maybe. It sure *looks* like we need to do

>> - return conn->asyncStatus == PGASYNC_BUSY || conn->write_failed;
>> + return conn->asyncStatus == PGASYNC_BUSY && !conn->write_failed;

> Dunno anything about the Windows angle, but after hacking together
> a reproducer for this, I'm fairly convinced that this is indeed a bug.

After further study, I now think that checking write_failed here is
the wrong thing entirely, because it's irrelevant to what we want to
do, which is keep reading until we've collected a server message or
seen read EOF.

What we do need to do though is check to see if we've closed the
socket. That's because places like libpqwalreceiver.c assume
that if PQisBusy is true, then PQsocket() must be valid:

while (PQisBusy(streamConn))
{
int rc;

rc = WaitLatchOrSocket(MyLatch,
WL_EXIT_ON_PM_DEATH | WL_SOCKET_READABLE |
WL_LATCH_SET,
PQsocket(streamConn),
0,
WAIT_EVENT_LIBPQWALRECEIVER_RECEIVE);

It seems like the existing coding in PQisBusy could allow that to fail.
I scraped the buildfarm for instances of the "cannot wait on socket
event without a socket" error that latch.c would emit if that happened,
but found none going back three months. That's perhaps because of the
other behavior I noted that if the walsender crashes, we'll probably
report write failure immediately in PQsendQuery. So maybe this is
mostly hypothetical in practice, but I think what we actually want
here is as attached.

In any case, I still don't see a way for this error to result in
an infinite loop, so it doesn't seem like an explanation for the
Windows problems.

regards, tom lane

Attachment Content-Type Size
PQisBusy-fix.patch text/x-diff 933 bytes

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Troy Frericks 2022-02-11 23:30:40 A bug with the TimeStampTZ data type and the 'AT TIME ZONE' clause
Previous Message Tom Lane 2022-02-11 19:28:12 Re: ERROR: XX000: variable not found in subplan target list