From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Strange failure on mamba |
Date: | 2022-11-30 05:42:25 |
Message-ID: | 20221130054225.3ydn5bxdrmel5ssu@awork3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2022-11-29 20:44:34 -0500, Tom Lane wrote:
> Thanks to commit 51b5834cd I've now been able to capture some info
> from mamba's last couple of failures [1][2]. Sure enough, what is
> happening is that postmaster children are getting stuck in recursive
> rtld symbol resolution. A couple of the stack traces I collected are
>
> #0 0xfdeede4c in ___lwp_park60 () from /usr/libexec/ld.elf_so
> #1 0xfdee3e08 in _rtld_exclusive_enter () from /usr/libexec/ld.elf_so
> #2 0xfdee59e4 in dlopen () from /usr/libexec/ld.elf_so
> #3 0x01e54ed0 in internal_load_library (
> libname=libname(at)entry=0xfd74cc88 "/home/buildfarm/bf-data/HEAD/pgsql.build/tmp_install/home/buildfarm/bf-data/HEAD/inst/lib/postgresql/libpqwalreceiver.so") at dfmgr.c:239
> #4 0x01e55c78 in load_file (filename=<optimized out>, restricted=<optimized out>) at dfmgr.c:156
> #5 0x01c5ba24 in WalReceiverMain () at walreceiver.c:292
> #6 0x01c090f8 in AuxiliaryProcessMain (auxtype=auxtype(at)entry=WalReceiverProcess) at auxprocess.c:161
> #7 0x01c10970 in StartChildProcess (type=WalReceiverProcess) at postmaster.c:5310
> #8 0x01c123ac in MaybeStartWalReceiver () at postmaster.c:5475
> #9 MaybeStartWalReceiver () at postmaster.c:5468
> #10 sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5131
> #11 <signal handler called>
> #12 0xfdee6b44 in _rtld_symlook_obj () from /usr/libexec/ld.elf_so
> #13 0xfdee6fc0 in _rtld_symlook_list () from /usr/libexec/ld.elf_so
> #14 0xfdee7644 in _rtld_symlook_default () from /usr/libexec/ld.elf_so
> #15 0xfdee795c in _rtld_find_symdef () from /usr/libexec/ld.elf_so
> #16 0xfdee7ad0 in _rtld_find_plt_symdef () from /usr/libexec/ld.elf_so
> #17 0xfdee1918 in _rtld_bind () from /usr/libexec/ld.elf_so
> #18 0xfdee1dc0 in _rtld_bind_secureplt_start () from /usr/libexec/ld.elf_so
> Backtrace stopped: frame did not save the PC
Do you have any idea why the stack can't be unwound further here? Is it
possibly indicative of a corrupted stack? I guess we'd need to dig into the
the netbsd libc code :(
> which is pretty much just the same thing we were seeing before
> commit 8acd8f869 :->
What libraries is postgres linked against? I don't know whether -z now only
affects the "top-level" dependencies of postgres, or also the dependencies of
shared libraries that haven't been built with -z now. The only dependencies
that I could see being relevant are libintl and openssl.
You could try if anything changes if you set LD_BIND_NOW, that should trigger
"recursive" dependencies to be loaded eagerly as well.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2022-11-30 05:50:12 | Re: Non-decimal integer literals |
Previous Message | Tom Lane | 2022-11-30 05:22:57 | Re: pg_dump bugs reported as pg_upgrade bugs |