Re: Add two missing tests in 035_standby_logical_decoding.pl

From: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Yu Shi (Fujitsu)" <shiy(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add two missing tests in 035_standby_logical_decoding.pl
Date: 2023-05-02 11:22:26
Message-ID: 8a13f0d8-71f3-773b-7135-5974fd53724b@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 5/2/23 8:28 AM, Amit Kapila wrote:
> On Fri, Apr 28, 2023 at 2:24 PM Drouvot, Bertrand
> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>>
>> I can see V7 failing on "Cirrus CI / macOS - Ventura - Meson" only (other machines are not complaining).
>>
>> It does fail on "invalidated logical slots do not lead to retaining WAL", see https://cirrus-ci.com/task/4518083541336064
>>
>> I'm not sure why it is failing, any idea?
>>
>
> I think the reason for the failure is that on standby, the test is not
> able to remove the file corresponding to the invalid slot. You are
> using pg_switch_wal() to generate a switch record and I think you need
> one more WAL-generating statement after that to achieve your purpose
> which is that during checkpoint, the tes removes the WAL file
> corresponding to an invalid slot. Just doing checkpoint on primary may
> not serve the need as that doesn't lead to any new insertion of WAL on
> standby. Is your v6 failing in the same environment?

Thanks for the feedback!

No V6 was working fine.

> If not, then it
> is probably due to the reason that the test is doing insert after
> pg_switch_wal() in that version. Why did you change the order of
> insert in v7?
>

I thought doing the insert before the switch was ok and as my local test
was running fine I did not re-consider the ordering.

> BTW, you can confirm the failure by changing the DEBUG2 message in
> RemoveOldXlogFiles() to LOG. In the case, where the test fails, it may
> not remove the WAL file corresponding to an invalid slot whereas it
> will remove the WAL file when the test succeeds.

Yeah, I added more debug information and what I can see is that the WAL file
we want to see removed is "000000010000000000000003" while the standby emits:

"
2023-05-02 10:03:28.351 UTC [16971][checkpointer] LOG: attempting to remove WAL segments older than log file 000000000000000000000002
2023-05-02 10:03:28.351 UTC [16971][checkpointer] LOG: recycled write-ahead log file "000000010000000000000002"
"

As per your suggestion, changing the insert ordering (like in V8 attached) makes it now work on the failing environment too.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v8-0001-Add-a-test-to-verify-that-invalidated-logical-slo.patch text/plain 2.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2023-05-02 11:43:53 Re: [PoC] pg_upgrade: allow to upgrade publisher node
Previous Message Amit Kapila 2023-05-02 10:57:19 Re: Logging parallel worker draught