Losing data from Postgres

From: Paul Breen <pbreen(at)computerpark(dot)co(dot)uk>
To: "pgsql-admin(at)postgreSQL(dot)org" <pgsql-admin(at)postgreSQL(dot)org>
Subject: Losing data from Postgres
Date: 2000-11-15 15:24:54
Message-ID: Pine.LNX.3.96.1001115152350.6198F-100000@cpark37.computerpark.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hello everyone,

Can anyone help us?

We are using Postgres in a hotspare configuration, that is, we have 2
separate boxes both running identical versions of Postgres and everytime
we insert|update|delete from the database we write to both boxes (at the
application level). All communications to the databases are in
transaction blocks and if we cannot commit to both databases then we
rollback.

Recently we have discovered that some records are missing from the main
database table on the master box but are present in the table on the
spare. We know its not a transaction problem (rolling back on master but
not spare) because reports printed at the time of the records being
written show the data in the master. It seems that some time afterwards
the records just disappear from the master! Both boxes are raided (Raid
5), don't know if this is significant.

Originally we were vacuuming twice a day but because some of the reports
we produce regularly were taking too long as the database grew, we added
multiple indexes onto the key tables and began vacuuming every hour. It's
only after doing this that we noticed the data loss - don't know if this
is coincidental or not. Yesterday we went back to vacuuming only twice a
day.

Looking in Postgres' server log we found some worrying error messages.
Again, don't know if these are significant:

1000822.00:39:44.676 [5182] NOTICE: LockRelease: locktable lookup failed,
no lock

515 instances on the master, 7 on the spare.

1000929.15:33:40.454 [20007] NOTICE: LockReplace: xid table corrupted

80 instances on the master, 0 on the spare - is this the problem?

1001027.11:55:04.071 [4847] NOTICE: Index dttmprod_ndx: NUMBER OF INDEX'
TUPLES (33586) IS NOT THE SAME AS HEAP' (33578)

1065 instances on the master, 20 on the spare. The reason there is so
many of these messages is that we didn't spot this error until after we
had done a lot of vacuums. Since rebuilding the indexes (dropping &
re-creating) we haven't seen this one again.

We are using Postgres version 6.5.3. I know its old but this is the
customer standard and we are loathe to upgrade unless its a problem
inherent with this version. From a cost-of-ownership viewpoint, its
easier to support a single version than multiple versions. The OS is SuSE
Linux 6.3 (kernel 2.2.13), the control system programs talk to Postgres
via libpq, the web interface is PHP3 and the Tcl/Tk interface is pgtksh.
The Raid controller is a DPT SmartCache 4 (28 MB RAM). Both the master
and the spare boxes are Intel Pentium III 550Mhz with 256MB RAM.

If we've left anything out, write and we'll supply more information.

Any help, especially around Postgres' server log messages would be greatly
appreciated.

Paul M. Breen, Software Engineer - Computer Park Ltd.

Tel: (01536) 417155
Email: pbreen(at)computerpark(dot)co(dot)uk

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Jean-Marc Pigeon 2000-11-15 16:02:08 Re: Losing data from Postgres
Previous Message Elias 2000-11-15 13:22:59 Read please