Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is src/test/modules/committs/t/002_standby.pl flaky?
Date: 2022-01-11 20:10:42
Message-ID: CA+hUKGLP+j9vuyJ2U8m+xKgx-gfCA9aJqfrchH0dWUCgYwr4eQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 12, 2022 at 4:00 AM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
> So here we get similar hanging on WaitLatchOrSocket().
> Just to make sure that it's indeed the same issue, I've removed socket
> shutdown&close and the test executed to the end (several times). Argh.

Ouch. I think our options at this point are:
1. Revert 6051857fc (and put it back when we have a working
long-lived WES as I showed). This is not very satisfying, now that we
understand the bug, because even without that change I guess you must
be able to reach the hanging condition by using Windows postgres_fdw
to talk to a non-Windows server (ie a normal TCP stack with graceful
shutdown/linger on process exit).
2. Put your poll() check into the READABLE side. There's some
precedent for that sort of kludge on the WRITEABLE side (and a
rejection of the fragile idea that clients of latch.c should only
perform "safe" sequences):

/*
* Windows does not guarantee to log an FD_WRITE network event
* indicating that more data can be sent unless the previous send()
* failed with WSAEWOULDBLOCK. While our caller might well have made
* such a call, we cannot assume that here. Therefore, if waiting for
* write-ready, force the issue by doing a dummy send(). If the dummy
* send() succeeds, assume that the socket is in fact write-ready, and
* return immediately. Also, if it fails with something other than
* WSAEWOULDBLOCK, return a write-ready indication to let our caller
* deal with the error condition.
*/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-01-11 20:16:50 Re: Why is src/test/modules/committs/t/002_standby.pl flaky?
Previous Message Justin Pryzby 2022-01-11 20:03:07 Re: pg_upgrade should truncate/remove its logs before running