Re: Something is rotten in the state of Denmark...

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Something is rotten in the state of Denmark...
Date: 2015-04-01 23:05:46
Message-ID: 20021.1427929546@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Observe these recent buildfarm failures:
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mule&dt=2015-03-21%2000%3A30%3A02
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=guaibasaurus&dt=2015-03-23%2004%3A17%3A01
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mule&dt=2015-03-31%2023%3A30%3A02

> Three similar-looking failures, on two different machines, in a regression
> test that has existed for less than three weeks. Something is very wrong.

I've been able to reproduce this. The triggering event seems to be that
the "VACUUM FULL pg_am" in vacuum.sql has to happen while another backend
is starting up. With a ten-second delay inserted at the bottom of
PerformAuthentication(), it's trivial to hit it manually. The reason we'd
not seen this before the rolenames.sql test was added is that none of the
other tests that run concurrently with vacuum.sql perform mid-test
reconnections, or ever have AFAIR. So as long as they all managed to
start up before vacuum.sql got to the dangerous step, no problem.

I've not fully tracked it down, but I think that the blame falls on the
MVCC-snapshots-for-catalog-scans patch; it appears that it's trying to
read pg_am's pg_class entry with a snapshot that's too old, possibly
because it assumes that sinval signaling is alive which I think ain't so.

For even more fun, try "VACUUM FULL pg_class" instead:

psql: PANIC: could not open critical system index 2662

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2015-04-01 23:13:02 Re: POLA violation with \c service=
Previous Message Bruce Momjian 2015-04-01 22:26:00 Re: pg_upgrade needs postmaster [sic]