Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

From: Noah Misch <noah(at)leadboat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is src/test/modules/committs/t/002_standby.pl flaky?
Date: 2022-03-19 08:47:04
Message-ID: 20220319084704.GB2822749@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 10, 2022 at 04:25:27PM -0500, Tom Lane wrote:
> Apropos of that, it's worth noting that wait_for_catchup *is*
> dependent on up-to-date stats, and here's a recent run where
> it sure looks like the timeout cause is AWOL stats collector:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2022-01-10%2004%3A51%3A34
>
> I wonder if we should refactor wait_for_catchup to probe the
> standby directly instead of relying on the upstream's view.

It would be nice. For logical replication tests, do we have a monitoring API
independent of the stats collector? If not and we don't want to add one, a
hacky alternative might be for wait_for_catchup to run a WAL-writing command
every ~20s. That way, if the stats collector misses the datagram about the
standby reaching a certain LSN, the stats collector would have more chances.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Borisov 2022-03-19 09:52:35 Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)
Previous Message Julien Rouhaud 2022-03-19 04:14:59 Re: pgsql: Add option to use ICU as global locale provider