Re: PATCH: backtraces for error messages

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Korry Douglas <korry(dot)douglas(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: backtraces for error messages
Date: 2018-06-23 12:22:19
Message-ID: CAMsr+YFivU93OjEH5FJb38gZzJVm1xHmjtawph2Lacr=ND7KRQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 22 June 2018 at 23:26, Korry Douglas <korry(dot)douglas(at)enterprisedb(dot)com>
wrote:

> A few misc comments:
>
> In general, +1.
>
> It might be nice to move the stack trace code into a separate function
> (not static to elog.c) - I have often found it useful to attach backtraces
> to objects so I can debug complex allocation/deallocation problems.
>

Good suggestion.

>
> The major expense in capturing a stack trace is in the symbolization of
> the stack addresses, not the stack walk itself. You typically want to
> symbolize the addresses at the time you generate the trace, but that's not
> always the case. For example, if you want to record the stack trace of
> every call to AllocSetAlloc() (and attach the trace to the AllocChunk)
> performance gets unbearable if you symbolize every trace. So a flag that
> specifies whether to symbolize would be nice too.
>

libunwind has some nifty caching that makes that a _lot_ cheaper; that's
part of why I went with it despite the extra lib requirement.

> If you don't symbolize, you need a way to capture the address range of
> each dynamically loaded shared object (so that you can later symbolize
> using something like addr2line).
>

Yeah. libunwind doesn't expose any interface to get that information, so
you'd have to use platform APIs, like glibc's dl_iterate_phdr or dladdr, or
capture /proc/self/maps. You have to make sure that info makes it to the
log, and is re-output often enough not to be lost to log rotation, and is
invalidated and re-output if mappings change due to new lib loads etc. But
you don't want to print it all the time either.

Then you have to walk the stack and print the instruction pointers and
stack pointers and spit out raw traces for the user to reassemble.

I'd argue that if you're doing the sort of thing where you want a stack of
every AllocSetAlloc, you shouldn't be trying to do that in-process. That's
where tools like perf, systemtap, etc really come into their own. I'm
focused on making additional diagnostic info for errors and key log
messages collectable for systems that aren't easily accessed, like users
who have 12-hour read/response latencies and security setups as strict as
they are insane and nonsensical.

I'd have no objection to the option to disable symbolic back traces and
print the raw call stack. It's trivial to do; in fact, I only removed the
ip/sp info to keep the output more concise.

I'm not interested in writing anything to handle the library load address
mapping collection etc though. I don't see a really sensible way to do that
in a log-based system.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2018-06-23 12:25:16 Re: PATCH: backtraces for error messages
Previous Message Konstantin Knizhnik 2018-06-23 11:40:55 Re: libpq compression