From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Daniel Gustafsson <daniel(at)yesql(dot)se>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, byavuz81(at)gmail(dot)com, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>, Michael Paquier <michael(at)paquier(dot)xyz> |
Subject: | Re: BUG #17391: While using --with-ssl=openssl and PG_TEST_EXTRA='ssl' options, SSL tests fail on OpenBSD 7.0 |
Date: | 2022-02-11 21:54:54 |
Message-ID: | 2187263.1644616494@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
I wrote:
>> Hmm, maybe. It sure *looks* like we need to do
>> - return conn->asyncStatus == PGASYNC_BUSY || conn->write_failed;
>> + return conn->asyncStatus == PGASYNC_BUSY && !conn->write_failed;
> Dunno anything about the Windows angle, but after hacking together
> a reproducer for this, I'm fairly convinced that this is indeed a bug.
After further study, I now think that checking write_failed here is
the wrong thing entirely, because it's irrelevant to what we want to
do, which is keep reading until we've collected a server message or
seen read EOF.
What we do need to do though is check to see if we've closed the
socket. That's because places like libpqwalreceiver.c assume
that if PQisBusy is true, then PQsocket() must be valid:
while (PQisBusy(streamConn))
{
int rc;
rc = WaitLatchOrSocket(MyLatch,
WL_EXIT_ON_PM_DEATH | WL_SOCKET_READABLE |
WL_LATCH_SET,
PQsocket(streamConn),
0,
WAIT_EVENT_LIBPQWALRECEIVER_RECEIVE);
It seems like the existing coding in PQisBusy could allow that to fail.
I scraped the buildfarm for instances of the "cannot wait on socket
event without a socket" error that latch.c would emit if that happened,
but found none going back three months. That's perhaps because of the
other behavior I noted that if the walsender crashes, we'll probably
report write failure immediately in PQsendQuery. So maybe this is
mostly hypothetical in practice, but I think what we actually want
here is as attached.
In any case, I still don't see a way for this error to result in
an infinite loop, so it doesn't seem like an explanation for the
Windows problems.
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
PQisBusy-fix.patch | text/x-diff | 933 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Troy Frericks | 2022-02-11 23:30:40 | A bug with the TimeStampTZ data type and the 'AT TIME ZONE' clause |
Previous Message | Tom Lane | 2022-02-11 19:28:12 | Re: ERROR: XX000: variable not found in subplan target list |