Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED
Date: 2023-02-18 00:27:04
Message-ID: CA+hUKGJUH_UN2G1EHpmvKBaJMbyuhrrxORw8yzmO4BHwUdqEMw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I still have no theory for how this condition was reached despite a
lot of time thinking about it and searching for more clues. As far as
I can tell, the recent improvements to postmaster's signal and event
handling shouldn't be related: the state management and logic was
unchanged.

While failing to understand this, I worked[1] on CI log indexing tool
with public reports that highlight this sort of thing[2], so I'll be
watching out for more evidence. Unfortunately I have no data from
before 1 Feb (cfbot previously wasn't interested in the past at all;
I'd need to get my hands on the commit IDs for earlier testing but I
can't figure out how to get those out of Cirrus or Github -- anyone
know how?). FWIW I have a thing I call bfbot for slurping up similar
data from the build farm. It's not pretty enough for public
consumption, but I do know that this assertion hasn't failed there,
except the cases I mentioned earlier, and a load of failures on
lorikeet which was completely b0rked until recently.

[1] https://xkcd.com/974/
[2] http://cfbot.cputube.org/highlights/assertion-90.html

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amin 2023-02-18 00:36:25 Share variable between psql backends in CustomScan
Previous Message Nathan Bossart 2023-02-17 23:43:44 Re: O(n) tasks cause lengthy startups and checkpoints