Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED

From: Andres Freund <andres(at)anarazel(dot)de>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED
Date: 2023-02-18 01:06:49
Message-ID: 20230218010649.vlksvn3o7jtshp4y@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-02-18 13:27:04 +1300, Thomas Munro wrote:
> I still have no theory for how this condition was reached despite a
> lot of time thinking about it and searching for more clues. As far as
> I can tell, the recent improvements to postmaster's signal and event
> handling shouldn't be related: the state management and logic was
> unchanged.

Yea, it's all very odd.

If you look at the log:

2023-02-08 00:53:20.175 GMT client backend[5948] pg_regress/name DETAIL: No valid identifier after ".".
2023-02-08 00:53:20.175 GMT client backend[5948] pg_regress/name STATEMENT: SELECT parse_ident('xxx.1020');
...
TRAP: failed Assert("PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED"), File: "../src/backend/storage/ipc/pmsignal.c", Line: 329, PID: 5948
abort() has been called
...
2023-02-08 00:53:27.420 GMT postmaster[872] LOG: server process (PID 5948) was terminated by exception 0xC0000354
2023-02-08 00:53:27.420 GMT postmaster[872] HINT: See C include file "ntstatus.h" for a description of the hexadecimal value.
2023-02-08 00:53:27.420 GMT postmaster[872] LOG: terminating any other active server processes
2023-02-08 00:53:27.434 GMT postmaster[872] LOG: all server processes terminated; reinitializing

and that it's indeed the money test that failed:
money ... FAILED (test process exited with exit code 2) 7337 ms

it's very hard to understand how this stack can come to be:

00000085`f03ffa40 00007ff6`fd89faa8 ucrtbased!abort(void)+0x5a [minkernel\crts\ucrt\src\appcrt\startup\abort.cpp @ 77]
00000085`f03ffa80 00007ff6`fd6474dc postgres!ExceptionalCondition(
char * conditionName = 0x00007ff6`fdd03ca8 "PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED",
char * fileName = 0x00007ff6`fdd03c80 "../src/backend/storage/ipc/pmsignal.c",
int lineNumber = 0n329)+0x78 [c:\cirrus\src\backend\utils\error\assert.c @ 67]
00000085`f03ffac0 00007ff6`fd676eff postgres!MarkPostmasterChildActive(void)+0x7c [c:\cirrus\src\backend\storage\ipc\pmsignal.c @ 329]
00000085`f03ffb00 00007ff6`fd59aa3a postgres!InitProcess(void)+0x2ef [c:\cirrus\src\backend\storage\lmgr\proc.c @ 375]
00000085`f03ffb60 00007ff6`fd467689 postgres!SubPostmasterMain(
int argc = 0n3,
char ** argv = 0x000001c6`f3814e80)+0x33a [c:\cirrus\src\backend\postmaster\postmaster.c @ 4962]
00000085`f03ffd90 00007ff6`fda0e1c9 postgres!main(
int argc = 0n3,
char ** argv = 0x000001c6`f3814e80)+0x2f9 [c:\cirrus\src\backend\main\main.c @ 192]

How can a process that we did notify crashing, that has already executed SQL
statements, end up in MarkPostmasterChildActive()?

> While failing to understand this, I worked[1] on CI log indexing tool
> with public reports that highlight this sort of thing[2], so I'll be
> watching out for more evidence. Unfortunately I have no data from
> before 1 Feb (cfbot previously wasn't interested in the past at all;
> I'd need to get my hands on the commit IDs for earlier testing but I
> can't figure out how to get those out of Cirrus or Github -- anyone
> know how?). FWIW I have a thing I call bfbot for slurping up similar
> data from the build farm. It's not pretty enough for public
> consumption, but I do know that this assertion hasn't failed there,
> except the cases I mentioned earlier, and a load of failures on
> lorikeet which was completely b0rked until recently.

> [1] https://xkcd.com/974/
> [2] http://cfbot.cputube.org/highlights/assertion-90.html

I think this extremely useful.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Imseih (AWS), Sami 2023-02-18 02:46:27 Re: Add index scan progress to pg_stat_progress_vacuum
Previous Message Andres Freund 2023-02-18 00:52:54 File* argument order, argument types