Re: pg_dump test instability

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump test instability
Date: 2018-08-27 13:35:21
Message-ID: 20180827133521.GN3326@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Peter Eisentraut (peter(dot)eisentraut(at)2ndquadrant(dot)com) wrote:
> During the development of an unrelated feature, I have experienced
> failures in a pg_dump test case, specifically
>
> t/002_pg_dump.pl ....... 1825/6052
> # Failed test 'defaults_parallel: should not dump COPY
> fk_reference_test_table second'
> # at t/002_pg_dump.pl line 3454.
>
> This test sets up two tables connected by a foreign key and checks that
> a data_only dump dumps them ordered so that the primary key table comes
> first.
>
> But because of the way the tests are set up, it also checks that in all
> other dumps (i.e., non-data_only) it does *not* dump them in that order.
> This is kind of irrelevant to the test, but there is no way to express
> in the pg_dump tests to not check certain scenarios.

Hmmm, yeah, that's a good point. Most of the checks are set up that way
because it makes writing checks simpler and we have fewer of them, but
in this case we don't want that requirement to be levied on
non-data-only dumps.

> In a non-data_only dump, the order of the tables doesn't matter, because
> the foreign keys are added at the very end. In parallel dumps, the
> tables are in addition sorted by size, so the resultant order is
> different from a single-threaded dump. This can be seen by comparing
> the dumped TOCs of the defaults_dir_format and defaults_parallel cases.
> But it all happens to pass the tests right now.

Occationally I get lucky, apparently. :) Though I think this might have
been different before I reworked those tests to be simpler.

> In my hacking I have added another test table to the pg_dump test set,
> which seems to have thrown off the sorting and scheduling, so that the
> two tables now happen to come out in primary-key-first order anyway,
> which causes the test to fail.

Ok.

> I have developed the attached rough patch to add a third option to
> pg_dump test cases: besides like and unlike, add a "skip" option to
> disregard the result of the test.

If I read this correctly, the actual test isn't run at all (though, of
course, the tables are created and such).

In any case though, this doesn't completely solve the problem, does it?
Skipping 'defaults' also causes 'defaults_parallel' to be skipped and
therefore avoids the issue for now, but if some other change caused the
ordering to be different in the regular (non-data_only) cases then this
test would blow up again.

Looking back at the 9.6 tests, tests were only run for the runs where
they were explicitly specified, which lead to tests being missed that
shouldn't have been, and that lead to the approach now used where every
test is against every run.

As this test should really only ever be applied to the 'data_only' run,
it seems like we should have a 'run' list and for this test that would
be just 'data_only'. I haven't looked into what's supposed to happen
here in a parallel data-only test, but if we expect the ordering to
still be honored then we should probably have a test for that.

So, in short, I don't really agree with this 'skip' approach as it
doesn't properly solve this problem but would rather see a 'only_runs'
option or similar where this test is only tried against the data_only
run (and possibly a data_only_parallel_restore one, if one such was
added). In any case, please be sure to also update the documentation
under 'Definition of the tests to run.' (about line 335) whenever the
set of options that can be specified for the tests is changed, and
let's strongly discourage the use of this feature for most tests as
these kinds of one-off's are quite rare.

Thanks!

Stephen

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-08-27 13:40:54 Re: simplehash.h comment
Previous Message Yugo Nagata 2018-08-27 12:51:24 Re: libpq debug log