strange behavior (corruption?) of large production database

From: Joe Conway <mail(at)joeconway(dot)com>
To: "Hackers (PostgreSQL)" <pgsql-hackers(at)postgresql(dot)org>
Subject: strange behavior (corruption?) of large production database
Date: 2005-12-02 23:44:20
Message-ID: 4390DC54.2050400@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

We have very strange behavior from an internal production database.

There are multiple symptoms, all pointing to a problem with clusterwide
tables. For example:

postgres(at)csdfds1:~> psql -U postgres -p 5433 cyspec
Welcome to psql 7.4.8, the PostgreSQL interactive terminal.

Type: \copyright for distribution terms
\h for help with SQL commands
\? for help on internal slash commands
\g or terminate with semicolon to execute query
\q to quit

cyspec=# select version();
version
-------------------------------------------------------------------------------------
PostgreSQL 7.4.8 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.3
(SuSE Linux)
(1 row)

postgres(at)csdfds1:~> psql -l
List of databases
Name | Owner | Encoding
----------------+----------+-----------
cyspec | postgres | SQL_ASCII
temp_mike | postgres | SQL_ASCII
temp_mike_new | postgres | SQL_ASCII
temp_mike_orig | postgres | SQL_ASCII
template0 | postgres | SQL_ASCII
template1 | postgres | SQL_ASCII
(6 rows)

cyspec=# select * from pg_database;
datname | datdba | encoding | datistemplate | datallowconn |
datlastsysoid | datvacuumxid | datfrozenxid | datpath | datconfig | datacl
---------+--------+----------+---------------+--------------+---------------+--------------+--------------+---------+-----------+--------
(0 rows)

cyspec=# \l
List of databases
Name | Owner | Encoding
------+-------+----------
(0 rows)

No databases found. Additionally:

cyspec=# select usename, usesysid from pg_shadow;
usename | usesysid
----------+----------
postgres | 1
colxl | 102
colro | 101
l400509 | 105
(4 rows)

cyspec=# \c - colprod
You are now connected as new user "colprod".
cyspec=> \c - colxl
You are now connected as new user "colxl".
cyspec=> \c - colprod
You are now connected as new user "colprod".
cyspec=> \c - zxcvvb
FATAL: user "zxcvvb" does not exist
Previous connection kept

The "colprod" user has disappeared from pg_shadow (there was one
previously, and it was never intentionally dropped), but I can still
connect with that user. The current problem was actually initially found
because pg_dump complained that the owner of the colprod schema didn't
exist.

One more thing:
cyspec=# show wal_sync_method;
wal_sync_method
-----------------
fdatasync
(1 row)

That works, but SHOW ALL and "select * from pg_settings;" return lines
and lines of nothing in psql.
(I mean literally blank lines, not even "(0 rows)")
After issuing \o /tmp/filename the output is all there, and looks normal.

The oddness was first noticed about 3 days after a maintenance shutdown.
As far as I have been told, during the maintenance window, there may
have been OS level package upgrades, and there was a firmware upgrade
done on the storage subsystem (NetApp).

Any advice at what to look at/do would be appreciated. This database is
somewhere around 1.1 TB in size, so dump and reload is not something
we're anxious to do.

Joe

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-12-02 23:44:59 Re: Spam 508
Previous Message Tom Lane 2005-12-02 23:38:01 Re: Numeric 508 datatype