Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED
Date: 2023-03-13 14:00:00
Message-ID: 65ad5e5b-d5f5-d3da-10d0-3999b298feb4@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

12.03.2023 10:18, Thomas Munro wrote:
> And again:
>
> TRAP: failed Assert("PMSignalState->PMChildFlags[slot] ==
> PM_CHILD_ASSIGNED"), File: "../src/backend/storage/ipc/pmsigna...
>
> https://cirrus-ci.com/task/6558324615806976
> https://api.cirrus-ci.com/v1/artifact/task/6558324615806976/testrun/build/testrun/pg_upgrade/002_pg_upgrade/log/002_pg_upgrade_old_node.log
> https://api.cirrus-ci.com/v1/artifact/task/6558324615806976/crashlog/crashlog-postgres.exe_0974_2023-03-11_13-57-27-982.txt

Here we have duplicate PIDs too:
...
2023-03-11 13:57:21.277 GMT [2152][client backend] [pg_regress/union][:0] LOG: 
disconnection: session time: 0:00:00.268 user=SYSTEM database=regression
host=[local]
...
2023-03-11 13:57:22.320 GMT [4340][client backend] [pg_regress/join][8/947:0]
LOG:  statement: set enable_hashjoin to 0;
TRAP: failed Assert("PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED"),
File: "../src/backend/storage/ipc/pmsignal.c", Line: 329, PID: 2152

And I see the following code in postmaster.c:
CleanupBackend(int pid,
               int exitstatus)    /* child's exit status. */
{
...
    dlist_foreach_modify(iter, &BackendList)
    {
        Backend    *bp = dlist_container(Backend, elem, iter.cur);
        if (bp->pid == pid)
        {
            if (!bp->dead_end)
            {
                if (!ReleasePostmasterChildSlot(bp->child_slot))
...

so if a backend with the same PID happened to start (but not reached
InitProcess() yet), when CleanBackend() is called to clean after a just
finished backend, the slot of the starting one will be released.

I am yet to construct a reproduction of the case, but it seems to me that
the race condition is not impossible here.

Best regards,
Alexander

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2023-03-13 14:10:58 Re: Reducing connection overhead in pg_upgrade compat check phase
Previous Message Justin Pryzby 2023-03-13 13:50:08 Re: Progress report of CREATE INDEX for nested partitioned tables