Re: [PATCH] Log crashed backend's query (activity string)

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Marti Raudsepp" <marti(at)juffo(dot)org>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Log crashed backend's query (activity string)
Date: 2011-09-06 22:09:44
Message-ID: 4E6653D80200002500040DCC@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Marti Raudsepp <marti(at)juffo(dot)org> writes:
>>> This patch adds the backend's current running query to the
>>> "backend crash" message.
>>
>> Sorry, this patch is entirely unacceptable. We cannot have the
>> postmaster's functioning depending on the contents of shared
>> memory still being valid ... most especially not when we know
>> that somebody just crashed, and could have corrupted the shared
>> memory in arbitrary ways. No, I don't think your attempts to
>> validate the data are adequate, nor do I believe they can be made
>> adequate.
>
> Why and why not?
>
>> And I doubt
>> that the goal is worth taking risks for.
>
> I am unable to count the number of times that I have had a
> customer come to me and say "well, the backend crashed". And I go
> look at their logs and I have no idea what happened. So then I
> tell them to include %p in log_line_prefix and set
> log_min_duration_statement=0 and call me if it happens again.
> This is a huge nuisance and a serious interference with attempts
> to do meaningful troubleshooting. When it doesn't add days or
> weeks to the time to resolution, it's because it prevents
> resolution altogether. We really, really need something like
> this.

I haven't had this experience more than a few times, but a few is
enough to recognize how painful it can be. It seems we're brave
enough to log *some* information at crash time, in spite of the risk
that memory may be corrupted in unpredictable ways. Sure, there is
a slim chance that when you think you're writing to the log you've
actually got a handle to a segment of a heap file, but that chance
is extremely slim -- and if that's where you're at you've probably
already written a 'segfault' message there, anyway. My gut feel is
this would allow diagnosis in a timely fashion often enough to save
more data than it puts at risk, to say nothing of people's time.

I don't know whether the patch on the table is coded as defensively
as it should be given the perilous times the new code would come
into play, but I don't think the idea should be rejected out of
hand.

-Kevin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ants Aasma 2011-09-06 22:20:17 Re: [COMMITTERS] pgsql: Clean up the #include mess a little.
Previous Message Tom Lane 2011-09-06 22:05:46 Re: [PATCH] Log crashed backend's query (activity string)