Re: Continuing instability in insert-conflict-specconflict test

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Continuing instability in insert-conflict-specconflict test
Date: 2021-06-13 22:09:20
Message-ID: 135129.1623622160@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Noah Misch <noah(at)leadboat(dot)com> writes:
> On Sun, Jun 13, 2021 at 04:48:48PM -0400, Tom Lane wrote:
>> * Adjust the test script's functions to emit a NOTICE *after* acquiring
>> a lock, not before.

> I suspect that particular lock acquisition merely unblocks the processing that
> reaches the final lock state expected by the test. So, ...

Ah, you're probably right.

>> * Annotate permutations with something along the lines of "expect N
>> NOTICE outputs before allowing this step to be considered complete",
>> which we'd attach to the unlock steps.

> ... I don't expect this to solve $SUBJECT. It could be a separately-useful
> feature, though.

I think it would solve it. In the examples at hand, where we have

@@ -377,8 +377,6 @@
pg_advisory_unlock

t
-s1: NOTICE: blurt_and_lock_123() called for k1 in session 1
-s1: NOTICE: acquiring advisory lock on 2
step s2_upsert: <... completed>
step controller_print_speculative_locks:
SELECT pa.application_name, locktype, mode, granted

and then those notices show up sometime later, I'm hypothesizing
that the actions did happen timely, but the actual delivery of
those packets to the isolationtester client did not. If we
annotated step s2_upsert with a marker to the effect of "wait
for 2 NOTICEs from session 1 before considering this step done",
we could resolve that race condition. Admittedly, this is putting
a thumb on the scales a little bit, but it's hard to see how to
deal with inconsistent TCP delivery delays without that.

(BTW, I find that removing the pq_flush() call at the bottom of
send_message_to_frontend produces this failure and a bunch of
other similar ones.)

> Yeah, a special permutation list entry like PQgetResult(s8) could solve
> failures like
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&dt=2021-06-11%2017%3A13%3A44

Right. I'm visualizing it more like annotating s7a8 as requiring
s8a1 to complete first -- or vice versa, either would stabilize
that test result I think.

We might be able to get rid of the stuff about concurrent step
completion in isolationtester.c if we required the spec files
to use annotations to force a deterministic step completion
order in all such cases.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2021-06-13 22:22:12 Re: Continuing instability in insert-conflict-specconflict test
Previous Message Thomas Munro 2021-06-13 22:03:08 Re: An out-of-date comment in nodeIndexonlyscan.c