Re: ERROR: catalog is missing 9 attribute(s) for relid 10297

From: "O'Shea, Brendan" <boshea(at)akamai(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: ERROR: catalog is missing 9 attribute(s) for relid 10297
Date: 2008-01-04 19:50:00
Message-ID: F6331254DAFC0041BEAE36A51D295F92328BF1@MAVS2.kendall.corp.akamai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

"Lane, Tom" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:>
> "O'Shea, Brendan" <boshea(at)akamai(dot)com> writes:
> >> "Lane, Tom" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> >> Ugh. Does it work if you do
> >> export PGOPTIONS="--ignore_system_indexes=1"
>
> > I tried that, but unfortunately pg_dump still fails to run
> and the error
> > message is identical to previous attempts.
>
> Huh. So it's not index corruption then. Table corruption is still a
> possibility but it seems likely that you'd be getting other errors
> during the seqscan that looks for the rows. This suggests
> that the rows
> actually disappeared from pg_attribute, which is a bit hard to credit.
> The only mechanisms I can think of are that VACUUM decided they were
> dead or something physically truncated the table. The latter would
> probably have zapped a lot of other rows though.
>

The auto vacuum daemon is running pretty frequently on the database
since one of our tables is very heavily updated, about 50-200 per
second, and the rows themselves are large, about 1k in size on average
and about 7000 rows in the table. We also run a nightly 'vacuum full'
operation on this table since it can sometimes grow fairly large (around
50-100MB or more) if there are any long running transactions that last
20-30 minutes or more. Not sure if this is related but thought it was
worth a mention.

> Have you checked to see if pg_roles is the only relation with this
> problem? Try
> select c.relname from pg_class c left join pg_attribute a
> on a.attrelid = c.oid and a.attnum > 0
> group by c.oid,c.relname,c.relnatts having count(*) != c.relnatts;
>

This SQL produces the output:
"pg_roles"
"pg_group"
"pg_rules"
"pg_user"
"pg_shadow"

Querying each table produces the error:
"pg_roles" => ERROR: catalog is missing 9 attribute(s) for relid 10297
"pg_group" => ERROR: catalog is missing 3 attribute(s) for relid 10303
"pg_rules" => ERROR: invalid attribute number 0 for pg_rules
"pg_user" => ERROR: catalog is missing 8 attribute(s) for relid 10306
"pg_shadow" => ERROR: catalog is missing 8 attribute(s) for relid 10300

> > Any suggestions for additional logging we might turn on to help
> > determine the cause of this issue?
>
> Maybe VACUUM VERBOSE on pg_attribute? Although you'd have to
> get lucky
> enough to catch the time that it zapped the rows, if that's what the
> problem is.
>

So are you saying that on our newly installed system we could
periodically run "VACUUM VERBOSE pg_attribute", append the output to a
log file, and then after the catalog error starts happening again we
could go to this log file to look for potential clues?

Running "VACUUM VERBOSE pg_attribute" now produces:

INFO: vacuuming "pg_catalog.pg_attribute"
INFO: scanned index "pg_attribute_relid_attnam_index" to remove 30 row
versions
DETAIL: CPU 0.00s/0.00u sec elapsed 0.03 sec.
INFO: scanned index "pg_attribute_relid_attnum_index" to remove 30 row
versions
DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "pg_attribute": removed 30 row versions in 1 pages
DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: index "pg_attribute_relid_attnam_index" now contains 2670 row
versions in 48 pages
DETAIL: 30 index row versions were removed.
11 index pages have been deleted, 11 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: index "pg_attribute_relid_attnum_index" now contains 2670 row
versions in 13 pages
DETAIL: 30 index row versions were removed.
2 index pages have been deleted, 2 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "pg_attribute": found 30 removable, 2670 nonremovable row
versions in 62 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 965 unused item pointers.
20 pages contain useful free space.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.04 sec.

Query returned successfully with no result in 500 ms.

Thanks for your help,
Brendan

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2008-01-04 19:57:07 Re: postgres 8.3 betat 1 version download
Previous Message Chris Browne 2008-01-04 19:39:55 Re: [OT] Slony (initial) Replication - Slow