From: | Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> |
---|---|
To: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Fix a race condition in ConditionVariableTimedSleep() |
Date: | 2025-05-05 11:15:15 |
Message-ID: | 2143012f-ac50-4fd2-9697-63b41484713a@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
05.05.2025 10:52, Bertrand Drouvot пишет:
> Hi hackers,
>
> While working on wait event related stuff I observed a failed assertion:
>
> "
> TRAP: failed Assert("node->next == 0 && node->prev == 0"), File: "../../../../src/include/storage/proclist.h", Line: 91
> "
>
> during pg_regress/database.
>
> To reproduce, add an ereport(LOG,..) or CHECK_FOR_INTERRUPTS() or whatever
> would trigger ConditionVariableBroadcast() in pgstat_report_wait_end():
Interestingly, our colleague stepped into same problem recently [1] . It
happened because he attempted to make overcomplex timeout (SIGALARM) handler.
But his solution was a bit different [2].
[1] https://postgr.es/m/076eb7bd-52e6-4a51-ba00-c744d027b15c@postgrespro.ru
[2]
https://postgr.es/m/attachment/175030/0001-CV-correctly-handle-cv_sleep_target-change.patch
And I believe, his solution is more elegant. Doesn't it?
But in first step, I doubt there should be any thing that cancels condition
variable during WaitLatch. Most probably you did wrong thing.
We convinced the colleague to rework the code to not trigger the issue in
first place.
--
regards
Yura Sokolov aka funny-falcon
From | Date | Subject | |
---|---|---|---|
Next Message | Andrei Lepikhov | 2025-05-05 11:38:39 | Re: MergeAppend could consider sorting cheapest child path |
Previous Message | Amit Kapila | 2025-05-05 10:59:18 | Re: Fix slot synchronization with two_phase decoding enabled |