Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set
Date: 2022-02-15 04:02:41
Message-ID: Ygsl4Q7MARNY/OWI@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 12, 2022 at 08:50:41PM -0800, Andres Freund wrote:
> On 2022-01-18 11:20:16 +0900, Michael Paquier wrote:
>> +# required for 002_pg_upgrade.pl
>> +REGRESS_SHLIB=$(abs_top_builddir)/src/test/regress/regress$(DLSUFFIX)
>> +export REGRESS_SHLIB
>
> It seems weird to propagate this into multiple places. Why don't we define
> that centrally?
>
> Although it's weird for this to use REGRESS_SHLIB, given it's just doing
> dirname() on it. 027_stream_regress.pl has the "defense" of not wanting to
> duplicate the variable with 017_shm.pl...
>
> Not that I understand why 017_shm.pl and all the regression test source
> fileseven need $(DLSUFFIX) - expand_dynamic_library_name() should take care of
> it?

I agree that we should be able to get rid of that in the long-term,
but this also feels like a separate issue to me and the patch is
already doing a lot. I am wondering about the interactions of
installcheck with abs_top_builddir, though. Should it be addressed
first? It does not feel like a mandatory requirement for this
thread, anyway.

> It's not like make / msvc put the data in different places:
> src/test/recovery/Makefile:REGRESS_OUTPUTDIR=$(abs_top_builddir)/src/test/recovery/tmp_check
> src/tools/msvc/vcregress.pl: $ENV{REGRESS_OUTPUTDIR} = "$topdir/src/test/recovery/tmp_check";

Yeah, removed.

>> +# From now on, the test of pg_upgrade consists in setting up an instance.
>
> What does "from now on" mean?

In this context, the next steps of the test. Removed.

>> +# Default is the location of this source code for both nodes used with
>> +# the upgrade.
>
> Can't quite parse.

Reworded, to something hopefully better.

>> +# Initialize a new node for the upgrade. This is done early so as it is
>> +# possible to know with which node's PATH the initial dump needs to be
>> +# taken.
>> +my $newnode = PostgreSQL::Test::Cluster->new('new_node');
>> +$newnode->init(extra => [ '--locale=C', '--encoding=LATIN1' ]);
>> +my $newbindir = $newnode->config_data('--bindir');
>> +my $oldbindir = $oldnode->config_data('--bindir');
>
> Why C/LATIN?

Well, these are bits from the patch that I have played with
extensively, and it took me some time to remember why this was needed.
The reason why I introduced this option is that the patch created the
database "regression" using a createdb command that would feed from
template1 as pg_regress used --use-existing. And this combination
required to enforce --locale=C to avoid two regression diffs in
int8.sql and numeric.sql. It is possible to simplify things by
removing --use-existing and the database creation, so as pg_regress
handles the creation of the database "regression" with template0 to
avoid any problems related to locales.

Now, if you do *not* do that, I have noticed that we run into problems
when testing the TAP script with older versions, where pg_regress
would may not create the "regression" database, hence requiring an
extra createdb (perhaps that's better with --locale=C and
--template=template0) with --use-existing present for the pg_regress
command, command coming from the old branch.

Hmm. At the end of the day, I am wondering whether we should not give
up entirely on the concept of running the regression tests on older
branches in the TAP script of a newer branch. pg_regress needs to
come from the old source tree, meaning that we would most likely need
to maintain a set of compatibility tweaks that would most probably
rot over the time, and the buildfarm only cares about the possibility
to set up old instances by loading dumps rather than running
pg_regress. This would also make the switch to TAP much easier (no
need for the extra createdb or --locale AFAIK). So attempting to
maintain all that is going to be a PITA in the long term, and there is
nothing running that automatically anyway.

There is also the extra requirement to adjust dump files, but that's
independent of setting up the old instance to upgrade, and I don't
really plan to tackle that as of this thread (note that the buildfarm
client has extra tweaks regarding that).

Any thoughts about that?

> Right now pg_upgrade test.sh uses --wal-segsize 1, and that has helped
> identify several bugs. So I'd rather not give it up, even if it's a bit weird.

--allow-group-access was missing as well.

>> + my @regress_command = [
>> + $ENV{PG_REGRESS},
>> + '--schedule', "$oldsrc/src/test/regress/parallel_schedule",
>> + '--bindir', $oldnode->config_data('--bindir'),
>> + '--dlpath', $dlpath,
>> + '--port', $oldnode->port,
>> + '--outputdir', $outputdir,
>> + '--inputdir', $inputdir,
>> + '--use-existing'
>> + ];
>
> I think this should use --host (c.f. 7340aceed72). Or is it intending to use
> the host via env? If so, why is the port specified?

Hm. It looks like you are right here, so added.

>> + @regress_command = (@regress_command, @extra_opts);
>> +
>> + $oldnode->command_ok(@regress_command,
>> + 'regression test run on old instance');
>
> I also think this should take EXTRA_REGRESS_OPTS into account - test.sh did.

This is already taken into account, as of the @extra_opts bits.

>> +# After dumping, update references to the old source tree's regress.so
>> +# to point to the new tree.
>> +if (defined($ENV{oldinstall}))
>> +{
>
> Kinda asking for its own function...

I am not sure this is a gain in readability just for this part, FWIW,
and once you drop support for setting up an old instance with
pg_regress, that would not be needed.

>> +# Update the instance.
>> +$oldnode->stop;
>> +
>> +# Time for the real run.
>
> As opposed to the unreal one?

Removed that.

>> +# pg_upgrade would complain if PGHOST, so as there are no attempts to
>> +# connect to a different server than the upgraded ones.
>
> "complain if PGHOST"?

There is no need for this tweak once check_pghost_envvar() is fixed to
be able to understand Windows paths. This was not working under the
CI on Windows anyway, but the check_pghost_envvar() fix does.

A last thing that was missing from the patch, AFAIK, is to scan the
contents of pg_upgrade_output.d/log, if anything is left around after
a failure so as the buildfarm is able to report all the logs.
pg_upgrade's .gitignore has no need for a refresh, as well.

I have split the patch set into two parts:
- 0001 is a fix for check_pghost_envvar() with the addition of a call
to is_unixsock_path() to make sure that Windows paths are handled.
This has proved to be enough to make the CI report green on Windows.
- 0002 is the test, with all the fixes and adjustments mentioned
upthread, including making sure that the tests can be run with older
branches, for now.
--
Michael

Attachment Content-Type Size
v8-0001-Fix-sanity-check-for-PGHOST-ADDR-in-pg_upgrade-wi.patch text/x-diff 1.6 KB
v8-0002-Switch-tests-of-pg_upgrade-to-use-TAP.patch text/x-diff 27.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2022-02-15 04:04:46 Re: do only critical work during single-user vacuum?
Previous Message Tom Lane 2022-02-15 03:58:32 Re: Better error message for unsupported replication cases