Re: Condition variable live lock

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Condition variable live lock
Date: 2018-01-06 21:00:24
Message-ID: 26266.1515272424@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> No opinion without seeing what you propose to change.

> OK, will put out a proposal.

I began with the intention of making no non-cosmetic changes, but then
I started to wonder why ConditionVariablePrepareToSleep bothers with a
!proclist_contains test, when the calling process surely ought not be
in the list -- or any other list -- since it wasn't prepared to sleep.
And that led me down a different rabbit hole ending in the conclusion
that proclist.h could stand some improvements too. I do not like the
fact that it's impossible to tell whether a proclist_node is in any
proclist or not. Initially, a proclist_node contains zeroes which is
a distinguishable state, but proclist_delete_offset resets it to
next = prev = INVALID_PGPROCNO which looks the same as a node that's in a
singleton list. We should have it reset to the initial state of zeroes
instead, and then we can add assertions to proclist_push_xxx that the
supplied node is not already in a list. Hence, I propose the first
attached patch which tightens things up in proclist.h and then removes
the !proclist_contains test in ConditionVariablePrepareToSleep; the
assertion in proclist_push_tail supersedes that.

The second attached patch is the cosmetic changes I want to make in
condition_variable.c/.h.

I still think that we ought to change the Asserts on cv_sleep_target in
ConditionVariablePrepareToSleep and ConditionVariableSleep to be full
test-and-elog tests. Those are checking a global correctness property
("global" meaning "interactions between completely unrelated modules can
break this"), and they'd be extremely cheap compared to the rest of what
those functions are doing, so I think insisting that they be Asserts is
penny wise and pound foolish. Anybody besides Robert want to vote on
that?

Another loose end that I'm seeing here is that while a process waiting on
a condition variable will respond to a cancel or die interrupt, it will
not notice postmaster death. This seems unwise to me. I think we should
adjust the WaitLatch call to include WL_POSTMASTER_DEATH as a wake
condition and just do a summary proc_exit(1) if it sees that. I'd even
argue that that is a back-patchable bug fix.

regards, tom lane

Attachment Content-Type Size
tighten-proclist-assertions.patch text/x-diff 6.3 KB
cv-cosmetic-changes.patch text/x-diff 9.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2018-01-06 21:00:47 Re: Add default role 'pg_access_server_files'
Previous Message Simon Riggs 2018-01-06 20:29:07 Re: [HACKERS] [PROPOSAL] Temporal query processing with range types