RE: Add two missing tests in 035_standby_logical_decoding.pl

From: "Yu Shi (Fujitsu)" <shiy(dot)fnst(at)fujitsu(dot)com>
To: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Add two missing tests in 035_standby_logical_decoding.pl
Date: 2023-04-26 09:58:04
Message-ID: OSZPR01MB6310F24417BDD04731BD7978FD659@OSZPR01MB6310.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 24, 2023 8:07 PM Drouvot, Bertrand <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> On 4/24/23 11:45 AM, Amit Kapila wrote:
> > On Mon, Apr 24, 2023 at 11:54 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> >>
> >> On Mon, Apr 24, 2023 at 11:24 AM Drouvot, Bertrand
> >> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> >>>
> >>
> >> Few comments:
> >> ============
> >>
> >
> > +# We can not test if the WAL file still exists immediately.
> > +# We need to let some time to the standby to actually "remove" it.
> > +my $i = 0;
> > +while (1)
> > +{
> > + last if !-f $standby_walfile;
> > + if ($i++ == 10 * $default_timeout)
> > + {
> > + die
> > + "could not determine if WAL file has been retained or not, can't continue";
> > + }
> > + usleep(100_000);
> > +}
> >
> > Is this adhoc wait required because we can't guarantee that the
> > checkpoint is complete on standby even after using wait_for_catchup?
>
> Yes, the restart point on the standby is not necessary completed even after
> wait_for_catchup is done.
>

I think that's because when replaying a checkpoint record, the startup process
of standby only saves the information of the checkpoint, and we need to wait for
the checkpointer to perform a restartpoint (see RecoveryRestartPoint), right? If
so, could we force a checkpoint on standby? After this, the standby should have
completed the restartpoint and we don't need to wait.

Besides, would it be better to wait for the cascading standby? If the wal log
file needed for cascading standby is removed on the standby, the subsequent test
will fail. Do we need to consider this scenario? I saw the following error
message after setting recovery_min_apply_delay to 5s on the cascading standby,
and the test failed due to a timeout while waiting for cascading standby.

Log of cascading standby node:
FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010000000000000003 has already been removed

Regards,
Shi Yu

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas 'ads' Scherbaum 2023-04-26 10:18:16 Find dangling membership roles in pg_dumpall
Previous Message Kyotaro Horiguchi 2023-04-26 09:47:14 Re: pg_stat_io for the startup process