Re: Unable to restart postgres - database system was interrupted

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: andy rost <Andy(dot)Rost(at)noaa(dot)gov>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Unable to restart postgres - database system was interrupted
Date: 2006-12-05 20:49:15
Message-ID: 10905.1165351755@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

andy rost <Andy(dot)Rost(at)noaa(dot)gov> writes:
> I'm curious about a couple of things. Why didn't the logs reflect the
> problem that it noticed when it tried to restart on 2006-12-04(what I
> mean by that, is Postgres thought the server had been interrupted on
> 2006-12-02 16:45 yet the logs for that date and time don't show that
> anything unusual happened).

Probably nothing did. That message is actually just reporting the
last-update timestamp found in $PGDATA/global/pg_control, which was
probably updated during a routine checkpoint or log segment switch.
IOW it's not the time of a problem, but the time the server was last
known to be functioning normally.

The question is why do you have a two-day-stale copy of pg_control :-(
... it should certainly have been updated many times since then.
In particular, given your log entries that indicate normal shutdown at
2006-12-04 10:30:11, pg_control *should* have contained a timestamp
equal to that (plus or minus a second or so at most).

> Secondly, how did Postgres know at the restart that a) a problem had
> occurred sometime in the past and b) a specific set of transaction logs
> is required to get back up again.

Again, this is based on the checkpoint pointer found in pg_control;
it wants xlog files starting at where the last checkpoint is alleged
to be by pg_control. It'd seem that pg_control is a lot older than
what is in pg_xlog/. I suspect if you checked the logs you'd find
that 0000000100000065000000F7 corresponds to about 2006-12-02 16:45.

The only previous instances that I can recall of something like this
were in databases that are normally mounted on NFS volumes, and because
of some NFS problem or other the database volume had become dismounted,
leaving the postmaster seeing directories underneath the mount point
on the root volume --- and in particular a different copy of pg_control.
Usually this causes all hell to break loose immediately, though, so if
you hadn't had any signs of trouble or missing data before you stopped
the database, I doubt that could be the explanation.

regards, tom lane

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Wei Weng 2006-12-05 20:56:15 Anything I can do to speed up this query?
Previous Message Anton Melser 2006-12-05 20:21:29 Re: n00b RAID + wal hot standby question