Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Why is src/test/modules/committs/t/002_standby.pl flaky?
Date: 2022-01-15 00:40:59
Message-ID: CA+hUKGLm-cgWDoGzj9Y=3SPKyWhvPCXXwnhUtv=2ePcLwWSbrA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jan 15, 2022 at 9:47 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> Walreceiver only started using WES in
> 2016-03-29 [314cbfc5d] Add new replication mode synchronous_commit = 'remote_ap
>
> With that came the following comment:
>
> /*
> * Ideally we would reuse a WaitEventSet object repeatedly
> * here to avoid the overheads of WaitLatchOrSocket on epoll
> * systems, but we can't be sure that libpq (or any other
> * walreceiver implementation) has the same socket (even if
> * the fd is the same number, it may have been closed and
> * reopened since the last time). In future, if there is a
> * function for removing sockets from WaitEventSet, then we
> * could add and remove just the socket each time, potentially
> * avoiding some system calls.
> */
> Assert(wait_fd != PGINVALID_SOCKET);
> rc = WaitLatchOrSocket(MyLatch,
> WL_EXIT_ON_PM_DEATH | WL_SOCKET_READABLE |
> WL_TIMEOUT | WL_LATCH_SET,
> wait_fd,
> NAPTIME_PER_CYCLE,
> WAIT_EVENT_WAL_RECEIVER_MAIN);
>
> I don't really see how libpq could have changed the socket underneath us, as
> long as we get it the first time after the connection successfully was
> established? I mean, there's a running command that we're processing the
> result of?

Erm, I didn't analyse the situation much back then, I just knew that
libpq could reconnect in early phases. I can see that once you reach
that stage you can count on socket stability though, so yeah that
should work as long as you can handle it correctly in the earlier
connection phase (probably using the other patch I posted and
Alexander tested), it should all work nicely. You'd probably want to
formalise the interface/documentation on that point.

> Nor do I understand what "any other walreceiver implementation"
> refers to?

I think I meant that it goes via function pointers to talk to
libpqwalreceiver.c, but I know now that we don't actually support
using that to switch to different code, it's just a solution to a
backend/frontend linking problem. The comment was probably just
paranoia based on the way the interface works.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message James Coleman 2022-01-15 00:42:27 Add last commit LSN to pg_last_committed_xact()
Previous Message Thomas Munro 2022-01-15 00:19:42 Re: Why is src/test/modules/committs/t/002_standby.pl flaky?