Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is src/test/modules/committs/t/002_standby.pl flaky?
Date: 2022-02-12 10:00:00
Message-ID: e7a6d735-c71c-96ba-3bd3-43f6779efdcb@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Andres,
11.02.2022 05:22, Andres Freund wrote:
> Over in another thread I made some wild unsubstantiated guesses that the
> windows issues could have been made much more likely by a somewhat odd bit of
> code in PQisBusy():
>
> https://postgr.es/m/1959196.1644544971%40sss.pgh.pa.us
>
> Alexander, any chance you'd try if that changes the likelihood of the problem
> occurring, without any other fixes / reverts applied?
Unfortunately I haven't seen an improvement for the test in question.
With the PQisBusy-fix.patch from [1] and without any other changes on
the master branch (52377bb8) it still fails (on iterations 13, 5, 2, 2
for me).
The diagnostic logging (in attachment) added:
2022-02-12 01:04:16.341 PST [4912] LOG:  libpqrcv_receive: PQgetCopyData
returned 0
2022-02-12 01:04:16.341 PST [4912] LOG:  libpqrcv_receive: PQgetCopyData
2 returned -1
2022-02-12 01:04:16.341 PST [4912] LOG:  libpqrcv_receive:
end-of-streaming or error: -1
2022-02-12 01:04:16.341 PST [4912] LOG:  libpqrcv_PQgetResult:
streamConn->asyncStatus: 1 && streamConn->status: 0
2022-02-12 01:04:16.341 PST [4912] LOG:  libpqrcv_receive
libpqrcv_PQgetResult returned 10551584, 1
2022-02-12 01:04:16.341 PST [4912] LOG:  libpqrcv_receive
libpqrcv_PQgetResult PGRES_COMMAND_OK
2022-02-12 01:04:16.341 PST [4912] LOG:  libpqrcv_PQgetResult:
streamConn->asyncStatus: 1 && streamConn->status: 0
2022-02-12 01:04:16.341 PST [4912] LOG:  libpqrcv_PQgetResult loop
before WaitLatchOrSocket
2022-02-12 01:04:16.341 PST [4912] LOG:  WSAEventSelect event->fd: 948,
flags: 21
2022-02-12 01:04:16.341 PST [4912] LOG:  WaitLatchOrSocket before
WaitEventSetWait
2022-02-12 01:04:16.341 PST [4912] LOG:  WaitEventSetWait before
WaitEventSetWaitBlock
2022-02-12 01:04:16.341 PST [4912] LOG:  WaitEventSetWaitBlock before
WaitForMultipleObjects: 3
...
shows that before the doomed WaitForMultipleObjects() call the field
conn->status is 0 (CONNECTION_OK).

[1] https://www.postgresql.org/message-id/2187263.1644616494%40sss.pgh.pa.us

Best regards,
Alexander

Attachment Content-Type Size
libpqrcv-diagnostic.patch text/x-patch 6.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2022-02-12 11:26:04 Re: Accommodate startup process in a separate ProcState array slot instead of in MaxBackends slots.
Previous Message Christoph Berg 2022-02-12 08:49:18 Re: pgsql: Add TAP test to automate the equivalent of check_guc