Re: Backends stalled in 'startup' state: index corruption

From: Greg Sabino Mullane <greg(at)endpoint(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, Jeff Frost <jeff(at)pgexperts(dot)com>
Subject: Re: Backends stalled in 'startup' state: index corruption
Date: 2012-05-26 15:17:14
Message-ID: 20120526151714.GC10277@tinybird.home
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 25, 2012 at 07:02:42PM -0400, Tom Lane wrote:
> However, the remaining processes trying to
> compute new init files would still have to complete the process, so I'd
> expect there to be a diminishing effect --- the ones that were stalling
> shouldn't all release exactly together. Unless there is some additional
> effect that's syncing them all. (I wonder for instance if the syncscan
> logic is kicking in here.)

How fast would you expect that to happen? As far as I could tell, they all
released at once, or at least within probably 15 seconds of each other;
I wasn't running ps constantly. I could check the logs and get a better
figure if you think it's an important data point.

> One interesting question is why there's a thundering herd of new
> arrivals in the first place. IIRC you said you were using a connection
> pooler. I wonder if it has a bug^H^H^Hdesign infelicity that makes it
> drop and reopen all its connections simultaneously.

No, we are not. Or rather, there is some pooling, but there is also a
fairly large influx of new connections. As far as I could tell, the
few existing connections were not affected.

> 1. Somebody decides to update one of those rows, and it gets dropped in
> some remote region of the table. The only really plausible reason for
> this is deciding to fool with the column-specific stats target
> (attstattarget) of a system catalog. Does that sound like something
> either of you might have done?

No, zero chance of this, barring some rogue intruder on the network
with a strange sense of humor.

> pg_attribute just enough smaller to avoid the scenario. Not sure about
> Greg's case, but he should be able to tell us the size of pg_attribute
> and his shared_buffers setting ...

pg_attribute around 5 MB (+6MB indexes), shared_buffers 4GB. However,
there is a *lot* of churn in pg_attribute and pg_class, mostly due
to lots of temporary tables.

P.S. Hmmm that's weird, I just double-checked the above and pg_attribute
is now 52MB/70MB (the original figures were from yesterday). At any rate,
nowhere near 1/4 shared buffers.

--
Greg Sabino Mullane greg(at)endpoint(dot)com
End Point Corporation
PGP Key: 0x14964AC8

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2012-05-26 15:39:23 VIP: new format for psql - shell - simple using psql in shell
Previous Message Sergey Koposov 2012-05-26 14:58:42 Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile