Re: A nightmare

From: Mauri Sahlberg <Mauri(dot)Sahlberg(at)claymountain(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: A nightmare
Date: 2005-05-04 09:47:57
Message-ID: 1115200077.11341.84.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

ma, 2005-05-02 kello 10:52 -0400, Tom Lane kirjoitti:
> Mauri Sahlberg <Mauri(dot)Sahlberg(at)claymountain(dot)com> writes:
> > I'm starting to become desperate. On saturday I dumped all databases,
> > wiped whole postgresql installation. Installed newest rpms for Fedora 1,
> > restored databases. Recompiled client libraries and binaries. Restarted
> > and after five hours of operation:
> > May 1 21:34:19 claymountain postgres[6337]: [2-1] ERROR: could not
> > access status of transaction 4250811410
> > May 1 21:34:19 claymountain postgres[6337]: [2-2] DETAIL: could not
> > open file
> > "/var/lib/pgsql/data/pg_clog/0FD5": No such file or directory
>
> Which exactly are the "newest rpms for Fedora 1" ... what PG version
> and where did you get them from?
>
Name : postgresql-server Relocations: (not
relocateable)
Version : 7.4.7 Vendor: (none)
Release : 2PGDG Build Date: Fri 25 Feb 2005
01:42:54 PM EET

Got them from
http://www.postgresql.org/ftp/binary/v7.4.7/rpms/fedora/fedora-core-1/

> It looks like a corrupt-data issue to me. You could follow the usual
> sorts of procedures to try to isolate and get rid of the bad data
> (see the list archives for details). But I think first you need to
> question what caused it. Could your disk drive be failing (or other
> hardware problem)? How much do you trust the specific kernel version
> you are currently running?

I have no control over the kernel version I am running. The server is
located on virtual machine and the kernel version claims to be Linux
claymountain.planeetta.com 2.4.20-021stab028.5.777-enterprise #1 SMP Tue
Feb 22 17:44:46 MSK 2005 i686 i686 i386 GNU/Linux. I have no trust or
distrust against it.

I've tried to contact the virtual server provider but so far the guy who
is supposed to know something about virtual servers has not been in and
is not returning my calls. As far as I can tell, the hardware "looks"
fine at least when looked at from a virtual server.

I moved the database that seemed to cause the corruption to an another
machine and now both servers have been happily running for more than 24
hours without any indication of data corruption.

I am happy but scared.

I would still like to know what caused the corruption. My current guess
is that it could be network related. Corruption occurred when the data
was collected on a different machine than where the database was
located. Collector is a c++-application using libpq++-4.0. The
corruption could have something to do with locales and network errors.

Regards,
Mauri Sahlberg

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Devrim GUNDUZ 2005-05-04 09:50:15 Re: Postgre 8.0 for Linux i586
Previous Message Richard Sitompul 2005-05-04 04:55:54 Re: How the query please!