Re: PATCH: backtraces for error messages

From: Korry Douglas <korry(dot)douglas(at)enterprisedb(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: backtraces for error messages
Date: 2018-06-25 12:25:29
Message-ID: CAPqDkTOiDQ+9qMVEgba-eZJrygKTQNjzGbuS_Q1axivg0H7RUw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jun 23, 2018 at 8:22 AM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:

> On 22 June 2018 at 23:26, Korry Douglas <korry(dot)douglas(at)enterprisedb(dot)com>
> wrote:
>
>> A few misc comments:
>>
>> In general, +1.
>>
>> It might be nice to move the stack trace code into a separate function
>> (not static to elog.c) - I have often found it useful to attach backtraces
>> to objects so I can debug complex allocation/deallocation problems.
>>
>
> Good suggestion.
>
>
>>
>> The major expense in capturing a stack trace is in the symbolization of
>> the stack addresses, not the stack walk itself. You typically want to
>> symbolize the addresses at the time you generate the trace, but that's not
>> always the case. For example, if you want to record the stack trace of
>> every call to AllocSetAlloc() (and attach the trace to the AllocChunk)
>> performance gets unbearable if you symbolize every trace. So a flag that
>> specifies whether to symbolize would be nice too.
>>
>
> libunwind has some nifty caching that makes that a _lot_ cheaper; that's
> part of why I went with it despite the extra lib requirement.
>

I've not used libunwind before - looking through the docs now. It does
seem much more flexible than backtrace(), thanks.

>
>
>> If you don't symbolize, you need a way to capture the address range of
>> each dynamically loaded shared object (so that you can later symbolize
>> using something like addr2line).
>>
>
> Yeah. libunwind doesn't expose any interface to get that information, so
> you'd have to use platform APIs, like glibc's dl_iterate_phdr or dladdr, or
> capture /proc/self/maps. You have to make sure that info makes it to the
> log, and is re-output often enough not to be lost to log rotation, and is
> invalidated and re-output if mappings change due to new lib loads etc. But
> you don't want to print it all the time either.
>
> Then you have to walk the stack and print the instruction pointers and
> stack pointers and spit out raw traces for the user to reassemble.
>
> I'd argue that if you're doing the sort of thing where you want a stack of
> every AllocSetAlloc, you shouldn't be trying to do that in-process. That's
> where tools like perf, systemtap, etc really come into their own. I'm
> focused on making additional diagnostic info for errors and key log
> messages collectable for systems that aren't easily accessed, like users
> who have 12-hour read/response latencies and security setups as strict as
> they are insane and nonsensical.
>

Understood - I realized after I replied that instrumenting AllocSetAlloc()
is a bit more complex than I had suggested. When I need to capture the
call stack of each allocation, I store the stack trace in malloc'ed
buffers, otherwise I get recursive (in AllocSetAlloc()). I suspect that's
unreasonable for a general-purpose solution.

I'd have no objection to the option to disable symbolic back traces and
> print the raw call stack. It's trivial to do; in fact, I only removed the
> ip/sp info to keep the output more concise.
>
> I'm not interested in writing anything to handle the library load address
> mapping collection etc though. I don't see a really sensible way to do that
> in a log-based system.
>

Agreed - I tend to use (saved, not logged) stack traces to find memory
leaks that valgrind can't see (or more precisely, excessive resource
consumption that is not actually leaked).

Thanks for your consideration.

-- Korry

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2018-06-25 13:14:40 Re: Auto-partitioning in PostgreSQL 10
Previous Message Kuntal Ghosh 2018-06-25 12:18:54 Re: Incorrect fsync handling in pg_basebackup's tar_finish