Re: Improving the isolationtester: fewer failures, less delay

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Improving the isolationtester: fewer failures, less delay
Date: 2021-06-16 01:18:20
Message-ID: 20210616011820.2ktysyfozfhqdtha@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Only halfway related: I wonder if we should remove the automatic
permutation stuff - it's practically never useful. Probably not worth
changing...

On 2021-06-15 17:09:00 -0400, Tom Lane wrote:
> +The general form of a permutation entry is
> +
> + "step_name" [ ( marker [ , marker ... ] ) ]
> +
> +where each marker defines a "blocking condition". The step will not be
> +reported as completed before all the blocking conditions are satisfied.

Minor suggestion: I think the folliwing would be a bit easier to read if
there first were a list of markers, and then separately the longer
descriptions. Right now it's a bit hard to see which paragraph
introduces a new type of marker, and which just adds further commentary.

> + /*
> + * Check for other steps that have finished. We must do this
> + * if oldstep completed; while if it did not, we need to poll
> + * all the active steps in hopes of unblocking oldstep.
> + */

Somehow I found the second sentence a bit hard to parse - I think it's
the "while ..." sounding a bit odd to me.

> + /*
> + * If the target session is still busy, apply a timeout to
> + * keep from hanging indefinitely, which could happen with
> + * incorrect blocker annotations. Use the same 2 *
> + * max_step_wait limit as try_complete_step does for deciding
> + * to die. (We don't bother with trying to cancel anything,
> + * since it's unclear what to cancel in this case.)
> + */
> + if (iconn->active_step != NULL)
> + {
> + struct timeval current_time;
> + int64 td;
> +
> + gettimeofday(&current_time, NULL);
> + td = (int64) current_time.tv_sec - (int64) start_time.tv_sec;
> + td *= USECS_PER_SEC;
> + td += (int64) current_time.tv_usec - (int64) start_time.tv_usec;
>+ if (td > 2 * max_step_wait)
> + {
> + fprintf(stderr, "step %s timed out after %d seconds\n",
> + iconn->active_step->name,
> + (int) (td / USECS_PER_SEC));
> + exit(1);
> + }
> + }
> + }
> }

It might be worth printing out the list of steps the being waited for
when reaching the timeout - it seems like it'd now be easier to end up
with a bit hard to debug specs. And one cannot necessarily look at
pg_locks or such anymore to debug em.

> @@ -833,6 +946,19 @@ try_complete_step(TestSpec *testspec, Step *step, int flags)
> }
> }
>
> + /*
> + * The step is done, but we won't report it as complete so long as there
> + * are blockers.
> + */
> + if (step_has_blocker(pstep))
> + {
> + if (!(flags & STEP_RETRY))
> + printf("step %s: %s <waiting ...>\n",
> + step->name, step->sql);
> + return true;
> + }

Might be a bug in my mental state machine: Will this work correctly for
PSB_ONCE, where we'll already returned before?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-06-16 01:22:51 Re: Improving the isolationtester: fewer failures, less delay
Previous Message David Rowley 2021-06-16 01:15:17 Re: disfavoring unparameterized nested loops