Re: BUG #18158: Assert in pgstat_report_stat() fails when a backend shutting down with stats pending

From: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To: Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18158: Assert in pgstat_report_stat() fails when a backend shutting down with stats pending
Date: 2026-05-13 00:46:25
Message-ID: CABPTF7XdDoLAoLKo7pOHmSqHniHf46Pw8Z=iqNx2uYi4J5ixBA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi Alexander,

On Wed, May 13, 2026 at 4:00 AM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
>
> Hello hackers,
>
> 16.10.2023 12:00, PG Bug reporting form wrote:
>
> The following bug has been logged on the website:
>
> Bug reference: 18158
> Logged by: Alexander Lakhin
> Email address: exclusion(at)gmail(dot)com
> PostgreSQL version: 16.0
> Operating system: Ubuntu 22.04
> Description:
> ...
> With the following modification in pgstat_flush_pending_entries():
> +if (nowait && (rand() % 10 == 0))
> + did_flush = false;
> +else
> +{
> did_flush = kind_info->flush_pending_cb(entry_ref, nowait);
> +}
>
> the issue reproduced easily:
> make -s check -C src/test/recovery/ PROVE_TESTS="t/012_subtransactions.pl"
> grep TRAP -r src/test/recovery/tmp_check/log
>
> # +++ tap check in src/test/recovery +++
> t/012_subtransactions.pl .. ok
> All tests successful.
> Files=1, Tests=12, 3 wallclock secs ( 0.01 usr 0.00 sys + 0.19 cusr 0.27
> csys = 0.47 CPU)
> Result: PASS
> src/test/recovery/tmp_check/log/012_subtransactions_primary.log:
> TRAP: failed Assert("!pgStatLocal.shmem->is_shutdown"), File: "pgstat.c",
> Line: 616, PID: 2410126
>
>
> It looks like skink produced this failure yesterday [1]:
> 241/316 recovery - postgresql:recovery/027_stream_regress ERROR 4374.08s exit status 1
> ...
> stderr:
> # Failed test 'check contents of pg_stat_statements on regression database'
> # at /home/bf/bf-build/skink/REL_17_STABLE/pgsql/src/test/recovery/t/027_stream_regress.pl line 177.
> # got: 'CREATE|f
> # SELECT|t'
> # expected: 'CREATE|t
> # DELETE|t
> # INSERT|t
> # SELECT|t
> # UPDATE|t'
> # Looks like you failed 1 test of 9.
>
> pgsql.build/testrun/recovery/027_stream_regress/log/027_stream_regress_primary.log
>
> 2026-05-11 14:09:34.397 CEST [4064132][walsender][40/0:0] LOG: released physical replication slot "standby_1"
> TRAP: failed Assert("!pgStatLocal.shmem->is_shutdown"), File: "../pgsql/src/backend/utils/activity/pgstat.c", Line: 612, PID: 4064132
> postgres: primary: walsender bf [local] streaming 0/15B4FC98(ExceptionalCondition+0x5f) [0x45ae268]
> postgres: primary: walsender bf [local] streaming 0/15B4FC98(pgstat_report_stat+0x14d) [0x4492fe2]
> postgres: primary: walsender bf [local] streaming 0/15B4FC98(+0x49315c) [0x449315c]
> postgres: primary: walsender bf [local] streaming 0/15B4FC98(shmem_exit+0x78) [0x4448f00]
> postgres: primary: walsender bf [local] streaming 0/15B4FC98(+0x449020) [0x4449020]
> postgres: primary: walsender bf [local] streaming 0/15B4FC98(proc_exit+0x22) [0x44490c1]
> postgres: primary: walsender bf [local] streaming 0/15B4FC98(+0x3f5dbd) [0x43f5dbd]
> ...
> 2026-05-11 14:09:34.629 CEST [4063470][postmaster][:0] LOG: server process (PID 4064132) was terminated by signal 6: Aborted
> 2026-05-11 14:09:34.629 CEST [4063470][postmaster][:0] DETAIL: Failed process was running: START_REPLICATION SLOT "standby_1" 0/3000000 TIMELINE 1
> 2026-05-11 14:09:34.631 CEST [4063470][postmaster][:0] LOG: terminating any other active server processes
>
> And perhaps (I can't find the full log now) one year ago [2]:
> 227/305 postgresql:recovery / recovery/027_stream_regress ERROR 3364.50s exit status 1
>
> [01:19:39.781](0.108s) not ok 9 - check contents of pg_stat_statements on regression database
> [01:19:39.781](0.000s) # Failed test 'check contents of pg_stat_statements on regression database'
> # at /home/bf/bf-build/skink/REL_17_STABLE/pgsql/src/test/recovery/t/027_stream_regress.pl line 173.
> [01:19:39.781](0.000s) # got: 'CREATE|f
> # SELECT|t'
> # expected: 'CREATE|t
> # DELETE|t
> # INSERT|t
> # SELECT|t
> # UPDATE|t'
>
> I've reproduced such failures with the above modification applied, just
> running:
> for i in {1..20}; do PROVE_TESTS="t/027*" make -s check -C src/test/recovery/ || break; done
>
> Reproduced on REL_17_STABLE and REL_16_STABLE (starting from dd8008e8e,
> which updated 027_stream_regress.pl).
>
> Not reproduced on REL_18_STABLE after 87a6690cc (coming from [3]).
>
> [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2026-05-11%2010%3A25%3A22
> [2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-03-18%2023%3A39%3A44
> [3] https://www.postgresql.org/message-id/yegnemsijlrhocsfgs7gs7irnczjgkom6fmk2a5u2b66pbvzwi%40ph2h3wppcvdy
>
> Best regards,
> Alexander

Thanks for reporting this. It appears like backpatching an equivalent
of 87a6690cc69 to REL_17/REL_16 would resolve this issue.

--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Xuneng Zhou 2026-05-13 03:30:41 Re: BUG #19439: pg_stat_xact_user_tables stat not currect during the transaction
Previous Message Fujii Masao 2026-05-12 23:40:49 Re: BUG #19473: regression error in dblink: another command is already in progress