Re: odd output in restore mode

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: odd output in restore mode
Date: 2008-05-12 22:14:00
Message-ID: 1210630440.29684.249.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Mon, 2008-05-12 at 16:57 -0400, Andrew Dunstan wrote:

> I have just been working on setting up a continuous recovery failover
> system, and noticed some odd log lines, shown below. (Using 8.3).

Hmmm, well, the first time you use something complex, there are some
surprising features, I guess. Most especially the log lines are there to
allow production issues to be diagnosed, not to create a beautiful log.

Many of the things that look somewhat strange are there for a reason,
since a wide range of options and save-your-customers-ass scenarios are
covered by the recovery code.

Suggestions for improvement are always welcome and you are welcome to
suggest doc changes, as many people do.

> First note that our parsing of recovery.conf in xlog.c is pretty bad,
> and at least we need to document the quirks if it's not going to be
> fixed. log_restartpoints is said to be boolean, but when I set it to an
> unquoted true I got a fatal error, while a quoted 'on' sets it to false,
> as seen. Ick.

Yes, some improvements are definitely due there.

> What is more, I apparently managed to get the recovery
> server to lose a WAL file and hang totally by having a bad
> recovery.conf. Triple ick.

Sounds like a bug you should report in the normal way. Correctness is
paramount. Or are you confusing the message in the log for file AA with
an error?

> Second, what is all this about .history files? My understanding is that
> they are not necessary, so surely we should try to stat them to see if
> they are present before trying to copy them. These lines are going to
> confuse a lot of people, I suspect (including me).

I try to keep it as simple as possible, since much of this code only
gets run when you really need it to work. The request for the .history
file and the cp are examples of that.

> Lastly, not quite related to this output, but in the same general area,
> should we have an option on pg_standby to allow removing the archive
> file after it has been restored?

There already is one, but its more complex than that. (%r)

> LOG: database system was interrupted; last known up at 2008-05-12
> 15:18:23 EDT
> LOG: starting archive recovery
> LOG: log_restartpoints = false
> LOG: restore_command = '../bin/pg_standby -t ../common_archive/failover
> ../common_archive %f %p %r '
> cp: cannot stat `../common_archive/00000001.history': No such file or
> directory
> cp: cannot stat `../common_archive/00000001.history': No such file or
> directory
> cp: cannot stat `../common_archive/00000001.history': No such file or
> directory
> LOG: restored log file "0000000100000000000000A5.00000068.backup" from
> archive
> LOG: restored log file "0000000100000000000000A5" from archive
> LOG: automatic recovery in progress
> LOG: redo starts at 0/A50000B0
> LOG: restored log file "0000000100000000000000A6" from archive
> LOG: restored log file "0000000100000000000000A7" from archive
> LOG: restored log file "0000000100000000000000A8" from archive
> LOG: restored log file "0000000100000000000000A9" from archive
> trigger file found
> LOG: could not open file "pg_xlog/0000000100000000000000AA" (log file
> 0, segment 170): No such file or directory
> LOG: redo done at 0/A9000068
> LOG: restored log file "0000000100000000000000A9" from archive
> cp: cannot stat `../common_archive/00000002.history': No such file or
> directory
> cp: cannot stat `../common_archive/00000002.history': No such file or
> directory
> cp: cannot stat `../common_archive/00000002.history': No such file or
> directory
> LOG: selected new timeline ID: 2
> cp: cannot stat `../common_archive/00000001.history': No such file or
> directory
> cp: cannot stat `../common_archive/00000001.history': No such file or
> directory
> cp: cannot stat `../common_archive/00000001.history': No such file or
> directory
> LOG: archive recovery complete
> LOG: database system is ready to accept connections
> LOG: autovacuum launcher started

There is an outstanding Windows issue with pg_standby that your help
would be appreciated with, shown on latest commitfest page. It's a
Windows issue and I don't maintain a Windows dev environment.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-05-12 22:46:06 Re: Fairly serious bug induced by latest guc enum changes
Previous Message Zdenek Kotala 2008-05-12 21:33:35 Re: bloated heapam.h

Browse pgsql-patches by date

  From Date Subject
Next Message Andrew Dunstan 2008-05-12 22:58:37 Re: odd output in restore mode
Previous Message David Fetter 2008-05-12 21:57:57 Re: Making sure \timing is on