pgsql: Rewrite ConditionVariableBroadcast() to avoid live-lock.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: Rewrite ConditionVariableBroadcast() to avoid live-lock.
Date: 2018-01-06 00:21:40
Message-ID: E1eXcF2-0003W7-Ia@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Rewrite ConditionVariableBroadcast() to avoid live-lock.

The original implementation of ConditionVariableBroadcast was, per its
self-description, "the dumbest way possible". Thomas Munro found out
it was a bit too dumb. An awakened process may immediately re-queue
itself, if the specific condition it's waiting for is not yet satisfied.
If this happens before ConditionVariableBroadcast is able to see the wait
queue as empty, then ConditionVariableBroadcast will re-awaken the same
process, repeating the cycle. Given unlucky timing this back-and-forth
can repeat indefinitely; loops lasting thousands of seconds have been
seen in testing.

To fix, add our own process to the end of the wait queue to serve as a
sentinel, and exit the broadcast loop once our process is not there
anymore. There are various special considerations described in the
comments, the principal disadvantage being that wakers can no longer
be sure whether they awakened a real waiter or just a sentinel. But in
practice nobody pays attention to the result of ConditionVariableSignal
or ConditionVariableBroadcast anyway, so that problem seems hypothetical.

Back-patch to v10 where condition_variable.c was introduced.

Tom Lane and Thomas Munro

Discussion: https://postgr.es/m/CAEepm=0NWKehYw7NDoUSf8juuKOPRnCyY3vuaSvhrEWsOTAa3w@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/aced5a92bf46532466417ab485bc94006cf60d91

Modified Files
--------------
src/backend/storage/lmgr/condition_variable.c | 82 +++++++++++++++++++++++++--
1 file changed, 77 insertions(+), 5 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2018-01-06 00:42:57 pgsql: Reorder steps in ConditionVariablePrepareToSleep for more safety
Previous Message Michael Paquier 2018-01-06 00:10:51 Re: pgsql: Implement channel binding tls-server-end-point for SCRAM