Re: Continuing instability in insert-conflict-specconflict test

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Continuing instability in insert-conflict-specconflict test
Date: 2021-06-13 20:48:48
Message-ID: 130167.1623617328@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Noah Misch <noah(at)leadboat(dot)com> writes:
> The test material added in commit 43e0841 continues to yield buildfarm
> failures.

Yeah. It's only a relatively small fraction of the total volume of
isolation-test failures, so I'm not sure it's worth expending a
whole lot of effort on this issue by itself.

> On Tue, Aug 25, 2020 at 12:04:41PM -0400, Tom Lane wrote:
>> I think what we have to do to salvage this test is to get rid of the
>> use of NOTICE outputs, and instead have the test functions insert
>> log records into some table, which we can inspect after the fact
>> to verify that things happened as we expect.

> That sounds promising. Are those messages important for observing server
> bugs, or are they for debugging/modifying the test itself? If the latter, one
> could just change the messages to LOG.

I think they are important, because they show that the things we expect
to happen occur when we expect them to happen.

I experimented with replacing the RAISE NOTICEs with INSERTs, and ran
into two problems:

1. You can't do an INSERT in an IMMUTABLE function. This is easily
worked around by putting the INSERT in a separate, volatile function.
That's cheating like mad of course, but so is the rest of the stuff
this test does in "immutable" functions.

2. The inserted events don't become visible from the outside until the
respective session commits. This seems like an absolute show-stopper.
After the fact, we can see that the events happened in the expected
relative order; but we don't have proof that they happened in the right
order relative to the actions visible in the test output file.

> ... Any of the above won't solve things
> completely, because 3 of the 21 failures have diffs in the pg_locks output:

Yeah, it looks like the issue there is that session 2 reports completion
of its step before session 1 has a chance to make progress after obtaining
the lock. This seems to me to be closely related to the race conditions
I described upthread.

[ thinks for awhile ... ]

I wonder whether we could do better with something along these lines:

* Adjust the test script's functions to emit a NOTICE *after* acquiring
a lock, not before.

* Annotate permutations with something along the lines of "expect N
NOTICE outputs before allowing this step to be considered complete",
which we'd attach to the unlock steps.

This idea is only half baked at present, but maybe there's something
to work with there. If it works, maybe we could improve the other
test cases that are always pseudo-failing in a similar way. For
example, in the deadlock tests, annotate steps with "expect step
Y to finish before step X".

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2021-06-13 21:10:32 Re: unnesting multirange data types
Previous Message Tomas Vondra 2021-06-13 20:28:43 Re: Use extended statistics to estimate (Var op Var) clauses