From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Nathan Bossart <nathan(at)postgresql(dot)org> |
Cc: | pgsql-committers(at)lists(dot)postgresql(dot)org |
Subject: | Re: pgsql: Move named LWLock tranche requests to shared memory. |
Date: | 2025-09-17 02:35:56 |
Message-ID: | aMoejB3iTWy1SxfF@paquier.xyz |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers |
Ni Nathan,
On Thu, Sep 11, 2025 at 09:15:12PM +0000, Nathan Bossart wrote:
> Move named LWLock tranche requests to shared memory.
>
> In EXEC_BACKEND builds, GetNamedLWLockTranche() can segfault when
> called outside of the postmaster process, as it might access
> NamedLWLockTrancheRequestArray, which won't be initialized. Given
> the lack of reports, this is apparently unusual, presumably because
> it is usually called from a shmem_startup_hook like this:
Since this commit has been merged, batta has kept failing. Here is
the first failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=batta&dt=2025-09-12%2002%3A05%3A01
I use this animal with a specific configuration:
shared_preload_libraries = 'pg_stat_statements'
compute_query_id = regress
regress_dump_restore
wal_consistency_checking
--enable-injection-points
The recovery tests 013_crash_restart.pl, 022_crash_temp_files.pl and
041_checkpoint_at_promote.pl stress some restart scenarios, not all
use injection points. I could not get a backtrace from the host.
However, I have come up with the following change in 013 that's able
to reproduce what I think is the same crash:
--- a/src/test/recovery/t/013_crash_restart.pl
+++ b/src/test/recovery/t/013_crash_restart.pl
@@ -21,6 +21,8 @@ my $psql_timeout = IPC::Run::timer($PostgreSQL::Test::Utils::timeout_default);
my $node = PostgreSQL::Test::Cluster->new('primary');
$node->init(allows_streaming => 1);
+$node->append_conf('postgresql.conf',
+ "shared_preload_libraries = 'pg_stat_statements'");
$node->start();
And here is the backtrace:
#0 0x000055fcdf6bc97a in NumLWLocksForNamedTranches () at lwlock.c:385
385 numLocks += NamedLWLockTrancheRequestArray[i].num_lwlocks;
(gdb) bt
#0 0x000055fcdf6bc97a in NumLWLocksForNamedTranches () at lwlock.c:385
#1 0x000055fcdf6bc9b3 in LWLockShmemSize () at lwlock.c:400
#2 0x000055fcdf65bda5 in CalculateShmemSize (num_semaphores=0x7ffcaf7a78e4) at ipci.c:130
#3 0x000055fcdf65c0b1 in CreateSharedMemoryAndSemaphores () at ipci.c:210
#4 0x000055fcdf42830c in PostmasterStateMachine () at postmaster.c:3223
#5 0x000055fcdf42703f in process_pm_child_exit () at postmaster.c:2558
#6 0x000055fcdf425729 in ServerLoop () at postmaster.c:1696
#7 0x000055fcdf424be1 in PostmasterMain (argc=4, argv=0x55fd0a8faa10) at postmaster.c:1403
#8 0x000055fcdef80a19 in main (argc=4, argv=0x55fd0a8faa10) at main.c:231
(gdb) p i
$3 = 0
(gdb) p NamedLWLockTrancheRequestArray[0]
Cannot access memory at address 0x7f15ee4ccc08
Thanks,
--
Michael
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2025-09-17 03:00:24 | pgsql: jit: Fix type used for Datum values in LLVM IR. |
Previous Message | Michael Paquier | 2025-09-17 01:16:27 | pgsql: injection_points: Fix incrementation of variable-numbered stats |