Re: 9.1.3 backends getting stuck in 'startup'

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Frost <jeff(at)pgexperts(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: 9.1.3 backends getting stuck in 'startup'
Date: 2012-05-24 19:21:37
Message-ID: 28484.1337887297@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Jeff Frost <jeff(at)pgexperts(dot)com> writes:
> A few times today, we've seen postgresql 9.1.3 backends on Ubuntu 11.10 x86_64
> get stuck in 'startup' mode. By that I mean the set_ps_output mode. Postgres
> is installed via Martin Pitt's packages.

I took another look at your original report here, and noticed something
that had escaped me the first time: the backtrace shows that what that
backend is doing is loading critical relcache entries the hard way, that
is, via seqscans on the system catalogs. This happens when the relcache
cache file (pg_internal.init) is missing or stale. A backend that has
to go through that will write a new pg_internal.init file when it's
done; but if many sessions are connecting at the same time, they'll each
have to do it until someone succeeds in creating a new cache file.

Another piece of evidence is here:

> #5 0x00007f7dd3d91ca4 in ReadBufferExtended (reln=0x7f7dd3b17540,
> forkNum=MAIN_FORKNUM, blockNum=201362, mode=RBM_NORMAL, strategy=<optimized
> out>) at bufmgr.c:246

It's possible that gdb is lying to us, but it sure looks like this read
request is for block 201362 of pg_attribute (I know it's pg_attribute
from the location of the heap_getnext call). That's 1.5Gb into the
relation, and who knows if it's anywhere near the end yet.

So in other words, we've got a boatload of backends all seqscanning a
pretty darn large relation, and doing so multiple times. That's
probably enough to explain why it took a long time to finish
initializing; the locking business is likely a red herring.

How big is pg_attribute normally in this database? I'm suspicious that
what triggered this is some major bloating of pg_attribute (and maybe
some of the other catalogs too). Once you had that, relcache cache file
invalidations are a fairly routine occurrence --- I think an autovacuum
on one of the system catalogs would trigger that, for example.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2012-05-24 19:36:39 Re: BUG #6664: Postgres server process does not come up in foreground and parent process is set to init ( PID 1 )
Previous Message karthik ananth 2012-05-24 18:57:46 Re: BUG #6664: Postgres server process does not come up in foreground and parent process is set to init ( PID 1 )