Re: Recent failures in IsolationCheck deadlock-hard

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Recent failures in IsolationCheck deadlock-hard
Date: 2019-08-17 21:28:28
Message-ID: 22195.1566077308@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> On Tue, Aug 6, 2019 at 6:18 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Yeah, there have been half a dozen failures since deadlock-parallel
>> went in, mostly on critters that are slowed by CLOBBER_CACHE_ALWAYS
>> or valgrind. I've tried repeatedly to reproduce that here, without
>> success :-(. It's unclear whether the failures represent a real
>> code bug or just a problem in the test case, so I don't really want
>> to speculate about fixes till I can reproduce it.

> I managed to reproduce a failure that looks a lot like lousyjack's
> (note that there are two slightly different failure modes):
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2019-08-05%2011:33:02

> I did that by changing the deadlock_timeout values for sessions d1 and
> d2 to just a few milliseconds on my slowest computer, guessing that
> this might be a race involving the deadlock timeout and the time it
> takes for workers to fork and join a lock queue.

Yeah, I eventually managed to reproduce it (not too reliably) by
introducing a randomized delay into parallel worker startup.

The scenario seems to be: some d1a2 worker arrives so late that it's not
accounted for in the initial DeadLockCheck performed by some d2a1 worker.
The other d1a2 workers are released, and run and finish, but the late one
goes to sleep, with a long deadlock_timeout. If the next DeadLockCheck is
run by e1l's worker, that prefers to release d2a1 workers, which then all
run to completion. When the late d1a2 worker finally wakes up and runs
DeadLockCheck, *there is no deadlock to resolve*: the d2 session is idle,
not waiting for any lock. So the worker goes back to sleep, and we sit
till isolationtester times out.

Another way to look at it is that there is a deadlock condition, but
one of the waits-for constraints is on the client side where DeadLockCheck
can't see it. isolationtester is waiting for d1a2 to complete before it
will execute d1c which would release session d2, so that d2 is effectively
waiting for d1, but DeadLockCheck doesn't know that and thinks that it's
equally good to unblock either d1 or d2.

The attached proposed patch resolves this by introducing another lock
that is held by d1 and then d2 tries to take it, ensuring that the
deadlock detector will recognize that d1 must be released.

I've run several thousand iterations of the test this way without a
problem, where before the MTBF was maybe a hundred or two iterations
with the variable startup delay active. So I think this fix is good,
but I could be wrong. One notable thing is that every so often the
test takes ~10s to complete instead of a couple hundred msec. I think
that what's happening there is that the last deadlock condition doesn't
form until after all of session d2's DeadLockChecks have run, meaning
that we don't spot the deadlock until some other session runs it. The
test still passes, though. This is probably fine given that it would
never happen except with platforms that are horridly slow anyway.
Possibly we could shorten the 10s values to make that case complete
quicker, but I'm afraid of maybe breaking things on slow machines.

> Another thing I noticed is that all 4 times I managed to reproduce
> this, the "rearranged to" queue had only two entries; I can understand
> that d1's workers might not feature yet due to bad timing, but it's
> not clear to me why there should always be only one d2a1 worker and
> not more.

I noticed that too, and eventually realized that it's a
max_worker_processes constraint: we have two parallel workers waiting
in e1l and e2l, so if d1a2 takes four, there are only two slots left for
d2a1; and for reasons that aren't totally clear, we don't get to use the
last slot. (Not sure if that's a bug in itself.)

The attached patch therefore also knocks max_parallel_workers_per_gather
down to 3 in this test, so that we have room for at least 2 d2a1 workers.

regards, tom lane

Attachment Content-Type Size
deadlock-fix.patch text/x-diff 5.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Sehrope Sarkuni 2019-08-17 21:28:47 Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Previous Message Andres Freund 2019-08-17 20:28:19 Re: max_parallel_workers can't actually be set?