Re: Backends stalled in 'startup' state: index corruption

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Sabino Mullane <greg(at)endpoint(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Backends stalled in 'startup' state: index corruption
Date: 2012-05-24 19:54:54
Message-ID: 29234.1337889294@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greg Sabino Mullane <greg(at)endpoint(dot)com> writes:
> Yesterday I had a client that experienced a sudden high load on
> one of their servers (8.3.5 - yes, I know. Those of you with
> clients will understand). When I checked, almost all connections
> were in a "startup" state, very similar to this thread:

> http://postgresql.1045698.n5.nabble.com/9-1-3-backends-getting-stuck-in-startup-td5670712.html

> Running a strace showed a lot of semop activity, and the logs showed a
> successful connection, then a 5 minute plus wait before a query was issued.
> So obviously, blocking on something.

Did you check I/O activity? I looked again at Jeff Frost's report and
now think that what he saw was probably a lot of seqscans on bloated
system catalogs, cf
http://archives.postgresql.org/message-id/28484.1337887297@sss.pgh.pa.us

> Unlike the thread above, I *did* find
> problems in the system catalogs. For example, both pg_class and pg_index
> gave warnings like this for every index during a VACUUM FULL
> VERBOSE tablename:

> WARNING: index "pg_class_relname_nsp_index" contains 7712 row versions,
> but table contains 9471 row versions
> HINT: Rebuild the index with REINDEX.

That's fairly interesting, but if it was a bloat situation then it
would've been the VAC FULL that fixed it rather than the REINDEX.
Did you happen to save the VERBOSE output? It'd be really useful to
know whether there was any major shrinkage of the core catalogs
(esp. pg_class, pg_attribute).

> * Did anything in the 8.3 series fix this?

I think there are probably two independent issues here. The missing
index entries are clearly bad but it's not clear that they had anything
to do with the startup stall. There are a couple of fixes in recent
8.3.x releases that might possibly explain the index corruption,
especially if you're in the habit of reindexing the system catalogs
frequently.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2012-05-24 19:55:21 Re: Missing optimization when filters are applied after window functions
Previous Message Sergey Koposov 2012-05-24 19:47:10 Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile