Re: Fixing WAL instability in various TAP tests

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Noah Misch <noah(at)leadboat(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fixing WAL instability in various TAP tests
Date: 2021-09-25 15:20:06
Message-ID: 2B4C64CF-EE3F-474B-9685-6A927E5E49AE@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Sep 25, 2021, at 7:17 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Leaving the tests brittle wastes developer time.
>
> Trying to make them proof against all possible settings would waste
> a lot more time, though.

You may be right, but the conversation about "all possible settings" was started by Noah. I was really just talking about tests that depend on wal files not being removed, but taking no action to guarantee that, merely trusting that under default settings they won't be. I can't square that design against other TAP tests that do take measures to prevent wal files being removed. Why is the precaution taken in some tests but not others? If this is intentional, shouldn't some comment in the tests without such precautions explain that choice? Are they intentionally testing that the default GUC wal size settings and wal verbosity won't break the test?

This isn't a rhetorical question:

In src/test/recovery/t/015_promotion_pages.pl, the comments talk about the how checkpoints impact what happens on the standby. The test issues an explicit checkpoint on the primary, and again later on the standby, so it is unclear if that's what the comments refer to, or if they also refer to implicit expectations about when/if other checkpoints will happen. The test breaks when I change the GUC settings, but I can fix that breakage by adding a replication slot to the test. Have I broken the purpose of the test by doing so, though? Does using a replication slot to force the wal to not be removed early break what the test is designed to check?

The other tests raise similar questions. Is the brittleness intentional?


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-09-25 15:40:39 Re: Release 14 Schedule
Previous Message Tom Lane 2021-09-25 14:59:27 Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails