Re: Printing backtrace of postgres processes

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: vignesh21(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, pryzby(at)telsasoft(dot)com, stark(at)mit(dot)edu, torikoshia(at)oss(dot)nttdata(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Printing backtrace of postgres processes
Date: 2022-11-30 06:13:02
Message-ID: CALj2ACVzvdjbXsUw+_pjSAoNkUWWhnmt2kCJCYDRfpOV-P0Ccw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 11, 2022 at 6:04 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Fri, Nov 11, 2022 at 7:59 AM Kyotaro Horiguchi
> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> >
> > At Thu, 10 Nov 2022 15:56:35 +0530, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote in
> > > On Mon, Apr 18, 2022 at 9:10 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > > >
> > > > The attached v21 patch has the changes for the same.
> > >
> > > Thanks for the patch. Here are some comments:
> > >
> > > 1. I think there's a fundamental problem with this patch, that is, it
> > > prints the backtrace when the interrupt is processed but not when
> > > interrupt is received. This way, the backends/aux processes will
> >
> > Yeah, but the obstacle was backtrace(3) itself. Andres pointed [1]
> > that that may be doable with some care (and I agree to the opinion).
> > AFAIS no discussions followed and things have been to the current
> > shape since then.
> >
> >
> > [1] https://www.postgresql.org/message-id/20201201032649.aekv5b5dicvmovf4%40alap3.anarazel.de
> > | > Surely this is *utterly* unsafe. You can't do that sort of stuff in
> > | > a signal handler.
> > |
> > | That's of course true for the current implementation - but I don't think
> > | it's a fundamental constraint. With a bit of care backtrace() and
> > | backtrace_symbols() itself can be signal safe:
> >
> > man 3 backtrace
> > > * backtrace() and backtrace_symbols_fd() don't call malloc() explic‐
> > > itly, but they are part of libgcc, which gets loaded dynamically
> > > when first used. Dynamic loading usually triggers a call to mal‐
> > > loc(3). If you need certain calls to these two functions to not
> > > allocate memory (in signal handlers, for example), you need to make
> > > sure libgcc is loaded beforehand.
>
> I missed that part. Thanks for pointing it out. The
> backtrace_symbols() seems to be returning a malloc'ed array [1],
> meaning it can't be used in signal handlers (if used, it can cause
> deadlocks as per [2]) and existing set_backtrace() is using it.
> Therefore, we need to either change set_backtrace() to use
> backtrace_symbols_fd() instead of backtrace_symobls() or introduce
> another function for the purpose of this feature. If done that, then
> we can think of preloading of libgcc which makes backtrace(),
> backtrace_symobols_fd() safe to use in signal handlers.
>
> Looks like we're not loading libgcc explicitly now into any of
> postgres processes, please correct me if I'm wrong here. If we're not
> loading it right now, is it acceptable to load libgcc into every
> postgres process for the sake of this feature?
>
> [1] https://linux.die.net/man/3/backtrace_symbols
> [2] https://stackoverflow.com/questions/40049751/malloc-inside-linux-signal-handler-cause-deadlock

I took a stab at it after discussing with Vignesh off-list. Here's
what I've done:

1. Make backtrace functions signal safe i.e. ensure libgcc is loaded,
just in case if it's not, by calling backtrace() function during the
early life of a process after SIGUSR1 signal handler and other signal
handlers are established.
2. Emit backtrace to stderr within the signal handler itself to keep
things simple so that we don't need to allocate dynamic memory inside
the signal handler.
3. Split the patch into 3 for ease of review, 0001 does preparatory
stuff, 0002 adds the function, 0003 adds the docs and tests.

I'm attaching the v23 patch set for further review.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v22-0001-Move-sending-multiplexed-SIGUSR1-signal-code-to-.patch application/octet-stream 5.6 KB
v22-0002-Add-function-to-log-the-backtrace-of-the-specifi.patch application/octet-stream 13.3 KB
v22-0003-Add-documentation-and-tests-for-pg_log_backtrace.patch application/octet-stream 10.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-11-30 06:15:15 Re: Strange failure on mamba
Previous Message Andres Freund 2022-11-30 06:06:47 Re: Strange failure on mamba