Re: why can the isolation tester handle only one waiting process?

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: why can the isolation tester handle only one waiting process?
Date: 2015-08-15 05:17:16
Message-ID: 20150815051716.GT5232@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas wrote:
> On Fri, Aug 14, 2015 at 2:57 PM, Alvaro Herrera
> <alvherre(at)2ndquadrant(dot)com> wrote:
> > Hmm, clearly you couldn't attach the info to the step itself, because a
> > step that blocks in one permutation doesn't necessarily block in every
> > permutation. You could attach it to each step that needed it in the
> > permutation, but then it wouldn't work to leave permutation
> > specification out for such a test. Maybe that's an acceptable
> > restriction if you cause the test to fail with a specific error instead
> > of stalling forever (which is what happens currently IIRC).
>
> After some study, I think the best thing to do here is change the way
> we handle the case where we reach a step that the use of a connection
> that is currently blocked on a lock. Right now, we handle that by
> declaring the permutation invalid; I'd like to change that so that
> consider that a cue to wait for that connnection to unblock itself.
> This will require a number of tests that currently blindly run through
> all permutations to specify a list of permutations, or they will hang.

Well, hanging forever may not be all that great. Buildfarm animals with
test processes stuck probably won't be happy. Maybe put a cap on the
time we're willing to wait; something like a minute should suffice for
all reasonable tests. At the same time I wonder if iterating as quickly
as possible is really such a hot idea; why don't we sleep even 100ms if
nothing is to be done immediately? That would reduce log traffic if you
have log_statements=all, for one thing ...

I guess (from a patch author perspective) we can just use
isolationtester -n to produce appropriate permutation lines when
developing a spec file, and then prune the ones causing trouble.

FWIW I tried this with the spec I posted at
http://www.postgresql.org/message-id/20141212205254.GC1768@alvh.no-ip.org
and it seems to work fine (modulo a bug in the spec itself). I didn't
try reverting the patch that fixed the bug.

> But I'm not sure that's such a bad thing, because running through all
> permutations in those cases provides no additional test coverage.
> Each invalid permutation runs the sequence of steps only up until the
> point where it chooses an invalid next step. Therefore, each invalid
> permutation is testing an initial prefix of the steps tested by some
> valid permutation. If the "invalid" permutation ceased to be invalid,
> because the command at which we give up returned immediately rather
> than waiting, that would also change the test output of the other,
> valid test of which it is the initial prefix. And therefore, at least
> as it seems to me, testing the invalid permutations is just a waste of
> CPU time, and we'd be better off not doing it.

Well, the number of tests that actually exercise this is not large.
More time is spent in the timeout test, ISTM (even though the CPU is
sleeping during that, but it's still wasted clock time).

> Actually, I'm really rather wondering if the list of valid
> permutations should also be pruned for some of these tests. Some of
> these output files are thousands of lines long, and I'm not sure that
> somebody has really gone through that whole file and made sure that
> the output of each permutation is expected. And I'm sure some of them
> are functionally identical.

No objections there, but alter-table-1 and alter-table-2 seem to be the
only tests that have thousands of lines long of expected output and also
have invalid permutations in the expected output. The only others with
1k+ lines are two-ids, receipt-reports and prepared-transactions, which
don't have invalid permutations.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2015-08-15 09:45:47 Re: Test code is worth the space
Previous Message Tom Lane 2015-08-15 05:16:41 Re: why can the isolation tester handle only one waiting process?