Re: Fwd: 8.1beta2 vacuum analyze hanging on idle database

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <pgsql-hackers(at)postgresql(dot)org>, <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <alvherre(at)alvh(dot)no-ip(dot)org>, <jnasby(at)pervasive(dot)com>
Subject: Re: Fwd: 8.1beta2 vacuum analyze hanging on idle database
Date: 2005-10-05 18:22:41
Message-ID: s343d3ac.066@gwmta.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I see that my initial post never made it through to the list. I assume
this was some technical failure, so I'm adding it back for this reply.

It doesn't appear that we did stop postmaster between incidents.
We have now done so.

The software we are running is a build from the beta2 release, with
no special options specified at ./configure time. Would you expect
such a build to include the debug info you wanted? We will include
the --enable-debug in our next build, but I wondered because I was
showing our DBA manager the diagnostic steps, and ran gdb bt
against an idle connection, and got:

(gdb) bt
#0 0x40197b46 in recv () from /lib/i686/libc.so.6
#1 0x0813485f in secure_read ()
#2 0x08138f7b in pq_recvbuf ()
#3 0x081393a9 in pq_getbyte ()
#4 0x08195565 in PostgresMain ()
#5 0x081716c5 in ServerLoop ()
#6 0x0817232e in PostmasterMain ()
#7 0x0813aad8 in main ()

Which seemed to show reasonable information, to my untrained eye.
That got me wondering whether the "(corrupt stack?)" note on the
previous backtrace might be something real. Both were run against
processes running the same copy of the backend software.

-Kevin


>>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> 10/04/05 4:22 PM >>>
"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> I can't hold the database in the problem state much longer -- if there
> are any other diagnostic steps you'd like me to take before we clear
> the problem, please let me know very soon.

Not at the moment ...

> INFO: vacuuming "pg_catalog.pg_constraint"
> INFO: index "pg_constraint_conname_nsp_index" now contains 35 row =
> versions in 2 pages
> DETAIL: 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO: index "pg_constraint_conrelid_index" now contains 35 row versions =
> in 2 pages
> DETAIL: 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> [Hanging here for about 2 hours so far.]

Interesting that it seems to consistently be having a problem with a
pg_constraint index. Have you restarted the postmaster at any point
since this trouble began? If it were something like an unreleased
buffer pin, then it could persist indefinitely until postmaster restart.

> (gdb) bt
> #0 0x40198488 in semop () from /lib/i686/libc.so.6
> #1 0x4a2c8cf8 in ?? ()
> #2 0xbfffb2e0 in ?? ()
> #3 0xbfffb308 in ?? ()
> #4 0x0816a3d4 in PGSemaphoreLock ()
> Previous frame inner to this frame (corrupt stack?)

This is fairly unhelpful :-(. The next stack frame down would have told
us something useful, but really we need to see the whole call stack.

It may be that you need to rebuild Postgres with --enable-debug in order
to get something gdb can work with.

regards, tom lane

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-10-05 18:27:32 Re: Fwd: 8.1beta2 vacuum analyze hanging on idle database
Previous Message Tom Lane 2005-10-05 18:04:08 Re: wrong optimization ( postgres 8.0.3 )