Re: Strange failure on mamba

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Strange failure on mamba
Date: 2022-11-30 05:42:25
Message-ID: 20221130054225.3ydn5bxdrmel5ssu@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-11-29 20:44:34 -0500, Tom Lane wrote:
> Thanks to commit 51b5834cd I've now been able to capture some info
> from mamba's last couple of failures [1][2]. Sure enough, what is
> happening is that postmaster children are getting stuck in recursive
> rtld symbol resolution. A couple of the stack traces I collected are
>
> #0 0xfdeede4c in ___lwp_park60 () from /usr/libexec/ld.elf_so
> #1 0xfdee3e08 in _rtld_exclusive_enter () from /usr/libexec/ld.elf_so
> #2 0xfdee59e4 in dlopen () from /usr/libexec/ld.elf_so
> #3 0x01e54ed0 in internal_load_library (
> libname=libname(at)entry=0xfd74cc88 "/home/buildfarm/bf-data/HEAD/pgsql.build/tmp_install/home/buildfarm/bf-data/HEAD/inst/lib/postgresql/libpqwalreceiver.so") at dfmgr.c:239
> #4 0x01e55c78 in load_file (filename=<optimized out>, restricted=<optimized out>) at dfmgr.c:156
> #5 0x01c5ba24 in WalReceiverMain () at walreceiver.c:292
> #6 0x01c090f8 in AuxiliaryProcessMain (auxtype=auxtype(at)entry=WalReceiverProcess) at auxprocess.c:161
> #7 0x01c10970 in StartChildProcess (type=WalReceiverProcess) at postmaster.c:5310
> #8 0x01c123ac in MaybeStartWalReceiver () at postmaster.c:5475
> #9 MaybeStartWalReceiver () at postmaster.c:5468
> #10 sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5131
> #11 <signal handler called>
> #12 0xfdee6b44 in _rtld_symlook_obj () from /usr/libexec/ld.elf_so
> #13 0xfdee6fc0 in _rtld_symlook_list () from /usr/libexec/ld.elf_so
> #14 0xfdee7644 in _rtld_symlook_default () from /usr/libexec/ld.elf_so
> #15 0xfdee795c in _rtld_find_symdef () from /usr/libexec/ld.elf_so
> #16 0xfdee7ad0 in _rtld_find_plt_symdef () from /usr/libexec/ld.elf_so
> #17 0xfdee1918 in _rtld_bind () from /usr/libexec/ld.elf_so
> #18 0xfdee1dc0 in _rtld_bind_secureplt_start () from /usr/libexec/ld.elf_so
> Backtrace stopped: frame did not save the PC

Do you have any idea why the stack can't be unwound further here? Is it
possibly indicative of a corrupted stack? I guess we'd need to dig into the
the netbsd libc code :(

> which is pretty much just the same thing we were seeing before
> commit 8acd8f869 :->

What libraries is postgres linked against? I don't know whether -z now only
affects the "top-level" dependencies of postgres, or also the dependencies of
shared libraries that haven't been built with -z now. The only dependencies
that I could see being relevant are libintl and openssl.

You could try if anything changes if you set LD_BIND_NOW, that should trigger
"recursive" dependencies to be loaded eagerly as well.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2022-11-30 05:50:12 Re: Non-decimal integer literals
Previous Message Tom Lane 2022-11-30 05:22:57 Re: pg_dump bugs reported as pg_upgrade bugs