Re: A test for replay of regression tests

From: Andres Freund <andres(at)anarazel(dot)de>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A test for replay of regression tests
Date: 2021-04-23 15:20:31
Message-ID: 20210423152031.unkv7mmvhw4pbjuc@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-04-23 17:37:48 +1200, Thomas Munro wrote:
> We have automated tests for many specific replication and recovery
> scenarios, but nothing that tests replay of a wide range of records.
> People working on recovery code often use installcheck (presumably
> along with other custom tests) to exercise it, sometimes with
> wal_consistency_check enabled. So, why don't we automate that? Aside
> from exercising the WAL decoding machinery (which brought me here),
> that'd hopefully provide some decent improvements in coverage of the
> various redo routines, many of which are not currently exercised at
> all.

Yay.

> I'm not quite sure where it belongs, though. The attached initial
> sketch patch puts it under rc/test/recovery near other similar things,
> but I'm not sure if it's really OK to invoke make -C ../regress from
> here.

I'd say it's not ok, and we should just invoke pg_regress without make.

> Add a new TAP test under src/test/recovery that runs the regression
> tests with wal_consistency_checking=all.

Hm. I wonder if running with wal_consistency_checking=all doesn't also
reduce coverage of some aspects of recovery, by increasing record sizes
etc.

> I copied pg_update/test.sh's trick of using a different
> outputdir to avoid clobbering a concurrent run under src/test/regress,
> and I also needed to invent a way to stop it from running the cursed
> tablespace test (deferring startup of the standby also works but eats
> way too much space, which I learned after blowing out a smallish
> development VM's disk).

That's because you are using wal_consistency_checking=all, right?
Because IIRC we don't generate that much WAL otherwise?

> +# Create some content on primary and check its presence in standby 1
> +$node_primary->safe_psql('postgres',
> + "CREATE TABLE tab_int AS SELECT generate_series(1,1002) AS a");
> +
> +# Wait for standby to catch up
> +$node_primary->wait_for_catchup($node_standby_1, 'replay',
> + $node_primary->lsn('insert'));

> +my $result =
> + $node_standby_1->safe_psql('postgres', "SELECT count(*) FROM tab_int");
> +print "standby 1: $result\n";
> +is($result, qq(1002), 'check streamed content on standby 1');

Why is this needed?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2021-04-23 15:21:52 Re: decoupling table and index vacuum
Previous Message osumi.takamichi@fujitsu.com 2021-04-23 14:33:53 RE: Forget close an open relation in ReorderBufferProcessTXN()