Re: diagnosing a db crash - server exit code 2

From: Robert Burgholzer <rburghol(at)vt(dot)edu>
To: "Burgholzer, Robert (DEQ)" <Robert(dot)Burgholzer(at)deq(dot)virginia(dot)gov>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, pgsql-admin(at)postgresql(dot)org
Subject: Re: diagnosing a db crash - server exit code 2
Date: 2011-09-28 19:54:13
Message-ID: CACT-NGKP07Ku5y7=JdOFDx+PYv-5w70CAv=N8yuc4w2HeSotHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Just a quick checkin on this problem. Thus far, I have managed to install
dbg and recompile postgresql with the appropriate debugging
headers/variables.

I have been following wiki that Scott sent, and attempted to trace one of my
pg processes while making it crash. I have "succeeded" in causing the crash
on my dev server, which suggests at least that it is not due to some
spurious piece of faulty hardware on my primary. I had failed to initiate
the log file creation on the process that was tracing, and thus it seems no
log file resulted. Also, the ssh session that was monitoring the process
died in the midst due to a local network routing glitch.

If anyone has any suggestions as to how to run the trace via a nohup command
or something, that would be cool, since then I could let it run in the
background. I can reproduce the crash, but it is somewhat episodic, it
seems that I can run the same query several times before things blow up.

So, in short, I am quite confident that I can get this finished shortly, but
very short on time to devote to it for the next couple of days.

Thanks again for the help, and sorry that I am drawing this process out,

r.b.

On Mon, Sep 26, 2011 at 8:20 AM, Burgholzer, Robert (DEQ) <
Robert(dot)Burgholzer(at)deq(dot)virginia(dot)gov> wrote:

> **
>
> Thanks to everyone, Tom, Joe, Scott, I will be in touch today as I move
> through this.
>
> Joe - if I need to have you log in for assistance, I am more than happy to
> make that happen.
>
> Regards,
> r.b.
>
>
>
> -----Original Message-----
> From: Joe Conway [mailto:mail(at)joeconway(dot)com <mail(at)joeconway(dot)com>]
> Sent: Fri 9/23/2011 5:03 PM
> To: Burgholzer, Robert (DEQ)
> Cc: Scott Marlowe; pgsql-admin(at)postgresql(dot)org
> Subject: Re: [ADMIN] diagnosing a db crash - server exit code 2
>
> On 09/23/2011 01:45 PM, Burgholzer, Robert (DEQ) wrote:
> > Joe - it appears that it ALWAYS involves pLR - even a simple median call
> > has caused it, though I must say it is something that is calculating the
> > median of somewhere around 10-20,000 pieces of data if that makes any
> > difference. I would be delighted to run any kind of debugging necessary
> > and share the info. I have an identical system that can reproduce the
> > errors (I am pretty certain that they HAVE previously). What I DON'T
> > have is any knowledge of the stack-trace/debugger things, but I'm
> > willing to learn, and I have a sysadmin who may be able to lend a hand.
>
> There is some good information about using gdb with postgres here:
>
>
> http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD
>
> If you need a hand, I would be happy to help you through the debugging
> via phone or even log in remotely if you can allow it. Just contact me
> off-list if you want to pursue that.
>
> Note that I made a new PL/R release just a few weeks ago which fixed
> several known crash-bugs. In particular these two pop out at me:
>
> - Fix missing calls to UNPROTECT.
> - Don't try to free an array element value when the
> array element is NULL
>
> Joe
>
> --
> Joe Conway
> credativ LLC: http://www.credativ.us
> Linux, PostgreSQL, and general Open Source
> Training, Service, Consulting, & 24x7 Support
>
>

--
--
Robert W. Burgholzer
http://www.findingfreestyle.com/
On Facebook -
http://www.facebook.com/pages/Finding-Freestyle/151918511505970
Twitter - http://www.twitter.com/findfreestyle
What's a tweeted swim set? A Sweet? No, a #swaiku! Get them by following
http://twitter.com/findfreestyle

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Craig Ringer 2011-09-29 02:39:55 Re: constraint triggers
Previous Message Kasia Tuszynska 2011-09-28 18:01:57 Re: Postgres 9 on 64 bit