Re: wake up logical workers after ALTER SUBSCRIPTION

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: wake up logical workers after ALTER SUBSCRIPTION
Date: 2022-12-15 22:47:21
Message-ID: 20221215224721.GA694065@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I tried setting wal_retrieve_retry_interval to 1ms for all TAP tests
(similar to what was done in 2710ccd), and I noticed that the recovery
tests consistently took much longer. Upon further inspection, it looks
like the same (or a very similar) race condition described in e5d494d's
commit message [0]. With some added debug logs, I see that all of the
callers of MaybeStartWalReceiver() complete before SIGCHLD is processed, so
ServerLoop() waits for a minute before starting the WAL receiver.

A simple fix is to have DetermineSleepTime() take the WalReceiverRequested
flag into consideration. The attached 0002 patch shortens the sleep time
to 100ms if it looks like we are waiting on a SIGCHLD. I'm not certain
this is the best approach, but it seems to fix the tests.

On my machine, I see the following improvements in the tests (all units in
seconds):
HEAD patched (v9)
check-world -j8 165 138
subscription 120 75
recovery 111 108

[0] https://postgr.es/m/21344.1498494720%40sss.pgh.pa.us

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v9-0001-wake-up-logical-workers-as-needed-instead-of-rely.patch text/x-diff 6.4 KB
v9-0002-handle-race-condition-when-restarting-wal-receive.patch text/x-diff 2.2 KB
v9-0003-set-wal_retrieve_retry_interval-to-1ms-in-tests.patch text/x-diff 2.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bagga, Rishu 2022-12-15 23:16:45 Re: SLRUs in the main buffer pool - Page Header definitions
Previous Message David Rowley 2022-12-15 22:42:30 Re: The drop-index-concurrently-1 isolation test no longer tests what it was meant to