Re: [PATCH] Log crashed backend's query (activity string)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Log crashed backend's query (activity string)
Date: 2011-09-06 22:57:07
Message-ID: CA+TgmoZ1xt=0cHvu4PBFze9SHZS5UhqXJPhqn30vUukvUUxEhQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 6, 2011 at 6:05 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Tue, Sep 6, 2011 at 5:34 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> And I doubt
>>> that the goal is worth taking risks for.
>
>> I am unable to count the number of times that I have had a customer
>> come to me and say "well, the backend crashed".  And I go look at
>> their logs and I have no idea what happened.
>
> gdb and print debug_query_string?

Surely you're kidding. These are customer systems which I frequently
don't even have access to. They don't always have gdb installed
(sometimes they are Windows systems) and if they do the customer isn't
likely to know how to use it, and even if they do they don't think the
better of us for needing such a tool to troubleshoot a crash. Even if
none of that were an issue, gdb is only going to work if you attach it
before the crash or have a core dump available. Typically you don't
know the crash is going to happen and core dumps aren't enabled
anyway.

> I don't dispute that this would be nice to have.  But I don't think that
> it's sane to compromise the postmaster's reliability in order to print
> information of doubtful accuracy.

In practice, I think very few crashes will clobber it. A lot of
crashes are going to be caused by a null pointer deference in some
random part of the program, an assertion failure, the OOM killer, etc.
It's certainly POSSIBLE that it could get clobbered, but it shouldn't
be very likely; and as Marti says, with proper defensive coding, the
worst case scenario if it does happen should be some log garbage.

> If you want to do something that doesn't violate the system's basic
> design goals, think about setting up a SIGSEGV handler that tries to
> print debug_query_string via elog before crashing.  It might well crash
> too, but it won't be risking taking out more of the database with it.

I don't think that's adequate. You need to trap a lot more than just
SIGSEGV to catch all the crashes - there's also SIGABRT and SIGILL and
a bunch of other ones, including SIGKILL. I think you really, really
need something that executes outside the context of the dying process.

TBH, I'm very unclear what could cause the postmaster to go belly-up
copying a bounded amount of data out of shared memory for logging
purposes only. It's surely possible to make the code safe against any
sequence of bytes that might be found there. The only real danger
seems to be that the memory access itself might trigger a segmentation
fault of some sort - but how is that going to happen? The child can't
unmap the address space in the parent, can it? If it's a real danger,
perhaps we could fork off a dedicated child process just to read the
relevant portion of shared memory and emit a log message - but I'm not
seeing what plausible scenario that would guard against.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2011-09-06 23:13:52 Re: Large C files
Previous Message Ants Aasma 2011-09-06 22:20:17 Re: [COMMITTERS] pgsql: Clean up the #include mess a little.