Re: production server down

From: Joe Conway <mail(at)joeconway(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Hackers (PostgreSQL)" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: production server down
Date: 2004-12-15 06:25:15
Message-ID: 41BFD8CB.6030902@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> Joe Conway <mail(at)joeconway(dot)com> writes:
>> Any theories on how we screwed up?
>
> I hesitate to suggest this, but maybe a cron job blindly copying data
> from point A to point B?

Not likely, but I'll check.

> Offhand my bets would revolve around (a) multiple postmasters trying
> to run the same PGDATA directory (we have interlocks to protect
> against this, but I have no faith that they work against an
> NFS-mounted data directory)

This might be possible I suppose. I know we have two init scripts.
Perhaps there is an error in them that caused both postmasters to point
to the same place when the server was rebooted. I'll look them over.

> or (b) you somehow wiped a PGDATA directory and restored it from
> backup tapes underneath a running postmaster.

This seems highly unlikely because our *nix admin would have had to
deliberately do it, and I don't think he'd fail to tell me about
something like that. But all the same, I'll ask him tomorrow.

Assuming the only real problem here is the control data (long shot, I
know), and the actual database files and transaction logs are OK, is
there any reasonable way to reconstruct the correct contol data? Or is
that the point at which you use pg_resetxlog?

Joe

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2004-12-15 06:32:43 Re: production server down
Previous Message Tom Lane 2004-12-15 06:10:21 Re: production server down