Re: Printing backtrace of postgres processes

From: Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, vignesh C <vignesh21(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Printing backtrace of postgres processes
Date: 2021-01-21 01:35:32
Message-ID: CAGRY4nw4ZLUiyCh13s_A0exCuAmdBgEbt3USDon0sOaDD0eKEg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 20 Jan 2021 at 01:31, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Sat, Jan 16, 2021 at 3:21 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > I'd argue that backtraces for those processes aren't really essential,
> > and indeed that trying to make the syslogger report its own backtrace
> > is damn dangerous.
>
> I agree. Ideally I'd like to be able to use the same mechanism
> everywhere and include those processes too, but surely regular
> backends and parallel workers are going to be the things that come up
> most often.
>
> > (Personally, I think this whole patch fails the safety-vs-usefulness
> > tradeoff, but I expect I'll get shouted down.)
>
> You and I are frequently on opposite sides of these kinds of
> questions, but I think this is a closer call than many cases. I'm
> convinced that it's useful, but I'm not sure whether it's safe. On the
> usefulness side, backtraces are often the only way to troubleshoot
> problems that occur on production systems. I wish we had better
> logging and tracing tools instead of having to ask for this sort of
> thing, but we don't.

Agreed.

In theory we should be able to do this sort of thing using external trace
and diagnostic tools like perf, systemtap, etc. In practice, these tools
tend to be quite version-sensitive and hard to get right without multiple
rounds of back-and-forth to deal with specifics of the site's setup,
installed debuginfo or lack thereof, specific tool versions, etc.

It's quite common to have to fall back on attaching gdb with a breakpoint
on a function in the export symbol table (so it works w/o debuginfo),
saving a core, and then analysing the core on a separate system on which
debuginfo is available for all the loaded modules. It's a major pain.

The ability to get a basic bt from within Pg is strongly desirable. IIRC
gdb's basic unwinder works without external debuginfo, if not especially
well. libunwind produces much better results, but that didn't pass the
extra-dependency bar when backtracing support was introduced to core
postgres.

On a side note, to help get better diagnostics I've also been meaning to
look into building --enable-debug with -ggdb3 so we can embed macro info,
and using dwz to deduplicate+compress the debuginfo so we can encourage
people to install it by default on production. I also want to start
exporting pointers to all the important data symbols for diagnostic use,
even if we do so in a separate ELF section just for debug use.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message tsunakawa.takay@fujitsu.com 2021-01-21 01:37:35 RE: POC: postgres_fdw insert batching
Previous Message Tom Lane 2021-01-21 01:33:47 Re: strange error reporting