Re: Continuing instability in insert-conflict-specconflict test

From: Asim Praveen <pasim(at)vmware(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Continuing instability in insert-conflict-specconflict test
Date: 2020-08-31 15:10:46
Message-ID: F8DC434A-9141-451C-857F-148CCA1D42AD@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Let me (rather shamelessly) extract a couple of patches from the
patch set that was already shared in the fault injection framework
proposal [1].

The first patch incorporates a new syntax in isolation spec grammar to
explicitly mark a step that is expected to block (due to reasons other
than locks). E.g.

permutation step1 step2& step3

The “&” suffix indicates that step2 is expected to block and isolation
tester should move on to step3 without waiting for step2 to finish.

The second patch implements the insert-conflict scenario that is being
discussed here - one session waits (using a “suspend” fault) after
inserting a tuple into the heap relation but before updating the
index. Another session concurrently inserts a conflicting tuple in
the heap and the index, and commits. Then the fault is reset so that
the blocked session resumes and detects conflict when updating the
index.

> On 25-Aug-2020, at 9:34 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> I wrote:
>> I've spent the day fooling around with a re-implementation of
>> isolationtester that waits for all its controlled sessions to quiesce
>> (either wait for client input, or block on a lock held by another
>> session) before moving on to the next step. That was not a feasible
>> approach before we had the wait_event infrastructure, but it's
>> seeming like it might be workable now. Still have a few issues to
>> sort out though ...
>
> I wasted a good deal of time on this idea, and eventually concluded
> that it's a dead end, because there is an unremovable race condition.
> Namely, that even if the isolationtester's observer backend has
> observed that test session X has quiesced according to its
> wait_event_info, it is possible for the report of that fact to arrive
> at the isolationtester client process before test session X's output
> does.

The attached test evades this race condition by not depending on any
output from the blocked session X. It queries status of the injected
fault to ascertain that a specific point in the code was reached
during execution.

>
> I think what we have to do to salvage this test is to get rid of the
> use of NOTICE outputs, and instead have the test functions insert
> log records into some table, which we can inspect after the fact
> to verify that things happened as we expect.
>

+1 to getting rid of NOTICE outputs.

Please refer to https://github.com/asimrp/postgres/tree/faultinjector
for the full patch set proposed in [1] that is now rebased against the
latest master.

Asim

[1] https://www.postgresql.org/message-id/flat/CANXE4Tc%2BRYRC48%3DdKYn1PvAjE26Ew4hh%3DXUjBRGj%3DJ9eob-S6g%40mail.gmail.com#cd02fa3b461102e97bcdc97e62dcc6d3

Attachment Content-Type Size
0001-Add-syntax-to-declare-a-step-that-is-expected-to-blo.patch application/octet-stream 6.5 KB
0002-Speculative-insert-isolation-test-spec-using-fault-i.patch application/octet-stream 6.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-08-31 15:20:57 Re: list of extended statistics on psql
Previous Message Bruce Momjian 2020-08-31 15:03:45 Re: file_fdw vs relative paths