Re: Do we need to rethink how to parallelize regression tests to speedup CLOBBER_CACHE_ALWAYS?

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Do we need to rethink how to parallelize regression tests to speedup CLOBBER_CACHE_ALWAYS?
Date: 2021-05-12 22:01:21
Message-ID: CAApHDvoi8-nH=vSU886pkRHUJZPYQcUd75fFfjZ-=V_gdQSFkA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 13 May 2021 at 01:50, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> There are a whole lot of cases where test Y depends on an earlier test X.
> Some of those dependencies are annotated in parallel_schedule, but I fear
> most are not.
>
> If we had a full list of such dependencies then we could imagine building
> a job scheduler that would dispatch any script that has no remaining
> dependencies.

I wonder if it could be done by starting a new parallel group and then
just move existing tests into it first verifying that:

1. The test does not display results from any pg_catalog table, or if
it does the filter is restrictive enough that there's no possibility
that the results will change due to other sessions changing the
catalogues.
2. If the test creates any new objects that those objects have a name
that's unlikely to conflict with other tests. e.g no tablenames like
t1
3. The test does not INSERT/DELETE/UPDATE/VACUUM/ALTER/ANALYZE any
tables that exist for more than 1 test.
4. Does not globally modify the system state. e.g ALTER SYSTEM.

We could document in parallel_schedule that tests in this particular
group must meet the above requirement, plus any others I've not
thought about. That list of reasons could be updated when we discover
other things I've neglected to think about.

I hope that now since we no longer have serial_schedule that just
having one source of truth for tests that the comments in the
parallel_schedule are more likely to be read and kept up to date.

I imagine there are many tests that could also just be run entirely in
a single begin; commit;. That would mean any catalogue changes they
made would not be visible to any other test which happens to query
that.

David

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2021-05-12 22:06:31 Re: PG 14 release notes, first draft
Previous Message Tom Lane 2021-05-12 21:49:25 Re: SearchCatCacheList()/SearchSysCacheList() is O(n)