Re: Add two missing tests in 035_standby_logical_decoding.pl

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, "Yu Shi (Fujitsu)" <shiy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add two missing tests in 035_standby_logical_decoding.pl
Date: 2023-05-06 02:10:04
Message-ID: CAA4eK1+-uuH+sC42_=wA=xPyzCN+U3VNeW_d-BRGvt-pGkNT+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 5, 2023 at 7:53 PM Drouvot, Bertrand
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
>
> On 5/5/23 2:28 PM, Amit Kapila wrote:
> > On Fri, May 5, 2023 at 5:36 PM Drouvot, Bertrand
> > <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> >
> > It seems due to some reason the current wal file is not switched due
> > to some reason.
>
> Oh wait, here is a NON failing one: https://cirrus-ci.com/task/5086849685782528 (I modified the
> .cirrus.yml so that we can download the "testrun.zip" file even if the test is not failing).
>
> So, in this testrun.zip we can see, that the test is ok:
>
> $ grep -i retain ./build/testrun/recovery/035_standby_logical_decoding/log/regress_log_035_standby_logical_decoding
> [10:06:08.789](0.000s) ok 19 - invalidated logical slots do not lead to retaining WAL
>
> and that the WAL file we expect to be removed is:
>
> $ grep "WAL file is" ./build/testrun/recovery/035_standby_logical_decoding/log/regress_log_035_standby_logical_decoding
> [10:06:08.789](0.925s) # BDT WAL file is /Users/admin/pgsql/build/testrun/recovery/035_standby_logical_decoding/data/t_035_standby_logical_decoding_standby_data/pgdata/pg_wal/000000010000000000000003
>
> This WAL file has been removed by the standby:
>
> $ grep -i 000000010000000000000003 ./build/testrun/recovery/035_standby_logical_decoding/log/035_standby_logical_decoding_standby.log | grep -i recy
> 2023-05-05 10:06:08.787 UTC [17521][checkpointer] DEBUG: 00000: recycled write-ahead log file "000000010000000000000003"
>
> But on the primary, it has been recycled way after that time:
>
> $ grep -i 000000010000000000000003 ./build/testrun/recovery/035_standby_logical_decoding/log/035_standby_logical_decoding_primary.log | grep -i recy
> 2023-05-05 10:06:13.370 UTC [16785][checkpointer] DEBUG: 00000: recycled write-ahead log file "000000010000000000000003"
>
> As, the checkpoint on the primary after the WAL file switch only recycled (001 and 002):
>
> $ grep -i recycled ./build/testrun/recovery/035_standby_logical_decoding/log/035_standby_logical_decoding_primary.log
> 2023-05-05 10:05:57.196 UTC [16785][checkpointer] LOG: 00000: checkpoint complete: wrote 4 buffers (3.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.027 s; sync files=0, longest=0.000 s, average=0.000 s; distance=11219 kB, estimate=11219 kB; lsn=0/2000060, redo lsn=0/2000028
> 2023-05-05 10:06:08.138 UTC [16785][checkpointer] DEBUG: 00000: recycled write-ahead log file "000000010000000000000001"
> 2023-05-05 10:06:08.138 UTC [16785][checkpointer] DEBUG: 00000: recycled write-ahead log file "000000010000000000000002"
> 2023-05-05 10:06:08.138 UTC [16785][checkpointer] LOG: 00000: checkpoint complete: wrote 20 buffers (15.6%); 0 WAL file(s) added, 0 removed, 2 recycled; write=0.001 s, sync=0.001 s, total=0.003 s; sync files=0, longest=0.000 s, average=0.000 s; distance=32768 kB, estimate=32768 kB; lsn=0/40000D0, redo lsn=0/4000098
>
>
> So, even on a successful test, we can see that the WAL file we expect to be removed on the standby has not been recycled on the primary before the test.
>

Okay, one possibility of not removing on primary is that at the time
of checkpoint (when we compute RedoRecPtr), the wal_swtich and insert
is not yet performed because in that case it will compute the
RedoRecPtr as a location before those operations which would be 0000*3
file. However, it is not clear how is that possible except from a
background checkpoint happening at that point but from LOGs, it
appears that the checkpoint triggered by test has recycled the wal
files.

> > I think we need to add more DEBUG info to find that
> > out. Can you please try to print 'RedoRecPtr', '_logSegNo', and
> > recptr?
> >
>
> Yes, will do.
>

Okay, thanks, please try to print similar locations on standby in
CreateRestartPoint().

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-05-06 02:23:17 Re: Autogenerate some wait events code and documentation
Previous Message Jeff Davis 2023-05-06 00:25:18 Re: Order changes in PG16 since ICU introduction