Re: Backends stalled in 'startup' state: index corruption

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Sabino Mullane <greg(at)endpoint(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Backends stalled in 'startup' state: index corruption
Date: 2012-05-24 22:25:05
Message-ID: 2561.1337898305@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greg Sabino Mullane <greg(at)endpoint(dot)com> writes:
> Oh, almost forgot: reading your reply to the old thread reminded me of
> something I saw in one of the straces right as it "woke up" and left
> the startup state to do some work. Here's a summary:

> 12:18:39 semop(4390981, 0x7fff66c4ec10, 1) = 0
> 12:18:39 semop(4390981, 0x7fff66c4ec10, 1) = 0
> 12:18:39 semop(4390981, 0x7fff66c4ec10, 1) = 0
> (x a gazillion)
> ...
> 12:18:40 brk(0x1c0af000) = 0x1c0af000
> ...(some more semops)...
> 12:18:40 mmap(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2ac062c98000
> ...(handful of semops)...
> 12:18:40 unlink("base/1554846571/pg_internal.init.11803") = -1 ENOENT (No such file or directory)
> 12:18:40 open("base/1554846571/pg_internal.init.11803", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 13
> 12:18:40 fstat(13, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
> 12:18:40 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2ac062cd9000
> 12:18:40 write(13, ...
> ...(normalish looking strace output after this)...

Yeah, this is proof that what it was doing is the same as what we saw in
Jeff's backtrace, ie loading up the system catalog relcache entries the
hard way via seqscans on the core catalogs. So the question to be
answered is why that's suddenly a big performance bottleneck. It's not
a cheap operation of course (that's why we cache the results ;-)) but
it shouldn't take minutes either. And, because they are seqscans, it
doesn't seem like messed-up indexes should matter.

The theory I have in mind about Jeff's case is that it was basically an
I/O storm, but it's not clear whether the same explanation works for
your case. There may be some other contributing factor that we haven't
identified yet.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sergey Koposov 2012-05-24 22:36:03 Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
Previous Message Bruce Momjian 2012-05-24 22:21:54 Re: Per-Database Roles