Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is src/test/modules/committs/t/002_standby.pl flaky?
Date: 2022-01-10 07:00:00
Message-ID: 3b904d7b-ef84-6f1b-9326-9f88c1374eb8@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

10.01.2022 05:00, Thomas Munro wrote:
> On Mon, Jan 10, 2022 at 8:06 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>> On Mon, Jan 10, 2022 at 12:00 AM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
>>> Going down through the call chain, I see that at the end of it
>>> WaitForMultipleObjects() hangs while waiting for the primary connection
>>> socket event. So it looks like the socket, that is closed by the
>>> primary, can get into a state unsuitable for WaitForMultipleObjects().
>> I wonder if FD_CLOSE is edge-triggered, and it's already told us once.
> Can you reproduce it with this patch?
Unfortunately, this fix (with the correction "(cur_event &
WL_SOCKET_MASK)" -> "(cur_event->events & WL_SOCKET_MASK") doesn't work,
because we have two separate calls to libpqrcv_PQgetResult():
> Then we get COMMAND_OK here:
>         res = libpqrcv_PQgetResult(conn->streamConn);
>         if (PQresultStatus(res) == PGRES_COMMAND_OK)
> and finally just hang at:
>             /* Verify that there are no more results. */
>             res = libpqrcv_PQgetResult(conn->streamConn);
The libpqrcv_PQgetResult function, in turn, invokes WaitLatchOrSocket()
where WaitEvents are defined locally, and the closed flag set on the
first invocation but expected to be checked on second.
>> I've managed to reproduce this failure too.
>> Removing "shutdown(MyProcPort->sock, SD_SEND);" doesn't help here, so
>> the culprit is exactly "closesocket(MyProcPort->sock);".
>>
> Ugh. Did you try removing the closesocket and keeping shutdown?
> I don't recall if we tried that combination before.
Even with shutdown() only I still observe WaitForMultipleObjects()
hanging (and WSAPoll() returns POLLHUP for the socket).

As to your concern regarding other clients, I suspect that this issue is
caused by libpqwalreceiver' specific call pattern and may be other
clients just don't do that. I need some more time to analyze this.

Best regards,
Alexander

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey V. Lepikhov 2022-01-10 07:37:34 Re: Multiple Query IDs for a rewritten parse tree
Previous Message Andres Freund 2022-01-10 06:39:58 Re: pg_upgrade verbosity when redirecting output to log file