pgsql: Rewrite ConditionVariableBroadcast() to avoid live-lock.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: Rewrite ConditionVariableBroadcast() to avoid live-lock.
Date: 2018-01-06 00:21:40
Message-ID: E1eXcF2-0003W9-Ia@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Rewrite ConditionVariableBroadcast() to avoid live-lock.

The original implementation of ConditionVariableBroadcast was, per its
self-description, "the dumbest way possible". Thomas Munro found out
it was a bit too dumb. An awakened process may immediately re-queue
itself, if the specific condition it's waiting for is not yet satisfied.
If this happens before ConditionVariableBroadcast is able to see the wait
queue as empty, then ConditionVariableBroadcast will re-awaken the same
process, repeating the cycle. Given unlucky timing this back-and-forth
can repeat indefinitely; loops lasting thousands of seconds have been
seen in testing.

To fix, add our own process to the end of the wait queue to serve as a
sentinel, and exit the broadcast loop once our process is not there
anymore. There are various special considerations described in the
comments, the principal disadvantage being that wakers can no longer
be sure whether they awakened a real waiter or just a sentinel. But in
practice nobody pays attention to the result of ConditionVariableSignal
or ConditionVariableBroadcast anyway, so that problem seems hypothetical.

Back-patch to v10 where condition_variable.c was introduced.

Tom Lane and Thomas Munro

Discussion: https://postgr.es/m/CAEepm=0NWKehYw7NDoUSf8juuKOPRnCyY3vuaSvhrEWsOTAa3w@mail.gmail.com

Branch
------
REL_10_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/1c77e990833a72039a71ca3f813ff6d05a4d09b9

Modified Files
--------------
src/backend/storage/lmgr/condition_variable.c | 82 +++++++++++++++++++++++++--
1 file changed, 77 insertions(+), 5 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2018-01-06 00:42:57 pgsql: Reorder steps in ConditionVariablePrepareToSleep for more safety
Previous Message Michael Paquier 2018-01-06 00:10:51 Re: pgsql: Implement channel binding tls-server-end-point for SCRAM