Re: Continuing instability in insert-conflict-specconflict test

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Continuing instability in insert-conflict-specconflict test
Date: 2020-08-24 21:21:27
Message-ID: 20200824212127.jbf6zulrvmeyvcnu@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020-08-24 13:42:35 -0700, Andres Freund wrote:
> Hi,
>
> On 2020-08-23 22:53:18 -0400, Tom Lane wrote:
> > We've seen repeated failures in the tests added by commit 43e084197:
> > ...
> > I dug into this a bit today, and found that I can reproduce the failure
> > reliably by adding a short delay in the right place, as attached.
> >
> > However, after studying the test awhile I have to admit that I do not
> > understand why all these failures look the same, because it seems to
> > me that this test is a house of cards. It *repeatedly* expects that
> > issuing a command to session X will result in session Y reporting
> > some notice before X's command terminates, even though X's command will
> > never meet the conditions for isolationtester to think it's blocked.

> > AFAICS that is nothing but wishful thinking. Why is it that only one of
> > those places has failed so far?
>
> Are there really that many places expecting that? I've not gone through
> this again exhaustively by any means, but most places seem to print
> something only before waiting for a lock.

ISTM the issue at hand isn't so much that X expects something to be
printed by Y before it terminates, but that it expects the next step to
not be executed before Y unlocks. If I understand the wrong output
correctly, what happens is that "controller_print_speculative_locks" is
executed, even though s1 hasn't yet acquired the next lock. Note how the
+s1: NOTICE: blurt_and_lock_123() called for k1 in session 1
+s1: NOTICE: acquiring advisory lock on 2
is *after* "step controller_print_speculative_locks", not just after
"step s2_upsert: <... completed>"

To be clear, there'd still be an issue about whether the NOTICE is
printed before/after the "s2_upsert: <... completed>", but it looks to
me the issue is bigger than that. It's easy enough to add another wait
in s2_upsert, but that doesn't help if the entire scheduling just
continues regardless of there not really being a waiter.

Am I missing something here?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-08-24 22:34:58 LWLockAcquire and LockBuffer mode argument
Previous Message Andres Freund 2020-08-24 20:42:35 Re: Continuing instability in insert-conflict-specconflict test