Re: Windows buildfarm members vs. new async-notify isolation test

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark Dilger <hornschnorter(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Windows buildfarm members vs. new async-notify isolation test
Date: 2019-12-03 16:40:48
Message-ID: 13003.1575391248@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Mark Dilger <hornschnorter(at)gmail(dot)com> writes:
> On 12/2/19 11:42 AM, Andrew Dunstan wrote:
>> On 12/2/19 11:23 AM, Tom Lane wrote:
>>> I'm a little baffled as to what this might be --- some sort of
>>> timing problem in our Windows signal emulation, perhaps? But
>>> if so, why haven't we found it years ago?

> I would be curious to see if there is a race condition in
> src/test/isolation/isolationtester.c between the loop starting
> on line 820:
> while ((res = PQgetResult(conn)))
> {
> ...
> }
> and the attempt to consume input that might include NOTIFY
> messages on line 861:
> PQconsumeInput(conn);

In principle, the issue should not be there, because commits
790026972 et al should have ensured that the NOTIFY protocol
message comes out before ReadyForQuery (and thus, libpq will
absorb it before PQgetResult will return NULL). I think the
timing problem --- if that's what it is --- must be on the
backend side; somehow the backend is not processing the
inbound notify queue before it goes idle.

Hmm ... just looking at the code again, could it be that there's
no well-placed CHECK_FOR_INTERRUPTS? Andrew, could you see if
injecting one in what 790026972 added to postgres.c helps?
That is,

/*
* Also process incoming notifies, if any. This is mostly to
* ensure stable behavior in tests: if any notifies were
* received during the just-finished transaction, they'll be
* seen by the client before ReadyForQuery is.
*/
+ CHECK_FOR_INTERRUPTS();
if (notifyInterruptPending)
ProcessNotifyInterrupt();

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-12-03 17:12:09 Re: Bogus EXPLAIN results with column aliases for mismatched partitions
Previous Message Tom Lane 2019-12-03 16:24:57 Re: Using XLogFileNameP in critical section