Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is src/test/modules/committs/t/002_standby.pl flaky?
Date: 2022-02-12 16:47:20
Message-ID: 2282783.1644684440@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alexander Lakhin <exclusion(at)gmail(dot)com> writes:
> 11.02.2022 05:22, Andres Freund wrote:
>> Over in another thread I made some wild unsubstantiated guesses that the
>> windows issues could have been made much more likely by a somewhat odd bit of
>> code in PQisBusy():
>> https://postgr.es/m/1959196.1644544971%40sss.pgh.pa.us
>> Alexander, any chance you'd try if that changes the likelihood of the problem
>> occurring, without any other fixes / reverts applied?

> Unfortunately I haven't seen an improvement for the test in question.

Yeah, that's what I expected, sadly. While I think this PQisBusy behavior
is definitely a bug, it will not lead to an infinite loop, just to write
failures being reported in a less convenient fashion than intended.

I wonder whether it would help to put a PQconsumeInput call *before*
the PQisBusy loop, so that any pre-existing EOF condition will be
detected. If you don't like duplicating code, we could restructure
the loop as

for (;;)
{
int rc;

/* Consume whatever data is available from the socket */
if (PQconsumeInput(streamConn) == 0)
{
/* trouble; return NULL */
return NULL;
}

/* Done? */
if (!PQisBusy(streamConn))
break;

/* Wait for more data */
rc = WaitLatchOrSocket(MyLatch,
WL_EXIT_ON_PM_DEATH | WL_SOCKET_READABLE |
WL_LATCH_SET,
PQsocket(streamConn),
0,
WAIT_EVENT_LIBPQWALRECEIVER_RECEIVE);

/* Interrupted? */
if (rc & WL_LATCH_SET)
{
ResetLatch(MyLatch);
ProcessWalRcvInterrupts();
}
}

/* Now we can collect and return the next PGresult */
return PQgetResult(streamConn);

In combination with the PQisBusy fix, this might actually help ...

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-02-12 16:50:00 Re: pgsql: Add TAP test to automate the equivalent of check_guc
Previous Message Christoph Berg 2022-02-12 16:31:55 Re: pgsql: Add TAP test to automate the equivalent of check_guc