Re: Add an optional timeout clause to isolationtester step.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add an optional timeout clause to isolationtester step.
Date: 2020-03-13 14:12:20
Message-ID: 32140.1584108740@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Julien Rouhaud <rjuju123(at)gmail(dot)com> writes:
> On Wed, Mar 11, 2020 at 05:52:54PM -0300, Alvaro Herrera wrote:
>> On 2020-Mar-11, Tom Lane wrote:
>>> I'd like to see an attempt to rewrite some of the existing
>>> timeout-dependent test cases to use this facility instead of
>>> long timeouts.

>> +1. Those long timeouts are annoying enough that infrastructure to make
>> a run shorter in normal circumstances might be sufficient justification
>> for this patch ...

> I'm not familiar with those test so I'm probably missing something, but looks
> like all isolation tests that setup a timeout are doing so to test server side
> features (deadlock detection, statement and lock timeout). I'm not sure how
> adding a client-side facility to detect locks earlier is going to help reducing
> the server side timeouts?

The point is that those timeouts have to be set long enough for even a
very slow machine to reach a desired state before the timeout happens;
on faster machines the test is just uselessly sleeping for a long time,
because of the fixed timeout. My thought was that maybe the tests could
be recast as "watch for session to reach $expected_state and then do
the next thing", allowing them to be automatically adaptive to the
machine's speed. This might require some rather subtle test redesign
and/or addition of more infrastructure (to allow recognition of the
desired state and/or taking an appropriate next action). I'm prepared
to believe that not much can be done about timeouts.spec in particular,
but it seems to me that the long delays in the deadlock tests are not
inherent in what we need to test.

> For the REINDEX CONCURRENTLY failure test, the problem that needs to be solved
> isn't detecting that the command is blocked as it's already getting blocked on
> a heavyweight lock, but being able to reliably cancel a specific query as early
> as possible, which AFAICS isn't possible with current isolation tester:

Right, it's the same thing of needing to wait till the backend has reached
a particular state before you do the next thing.

> So we would actually only need something like this to make it work:
> step "<name>" [ CANCEL IF BLOCKED ] { <SQL }

I continue to resist the idea of hard-wiring this feature to query cancel
as the action-to-take. That will more or less guarantee that it's not
good for anything but this one test case. I think that the feature
should have the behavior of "treat this step as blocked once it's reached
state X", and then you make the next step in the permutation be one that
issues a query cancel. (Possibly, using pg_stat_activity and
pg_cancel_backend for that will be painful enough that we'd want to
invent separate script syntax that says "send a cancel to session X".
But that's a separate discussion.)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Prabhat Sahu 2020-03-13 14:16:32 Re: [Proposal] Global temporary tables
Previous Message Alvaro Herrera 2020-03-13 14:12:04 Re: allow online change primary_conninfo