Re: pg_standby -l might destory the archived file

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_standby -l might destory the archived file
Date: 2009-06-04 08:53:05
Message-ID: 1244105585.23910.153.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Tue, 2009-06-02 at 14:54 -0400, Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> > Tom Lane wrote:
> >> That's a good point; don't we recover files under names like
> >> RECOVERYXLOG, not under names that could possibly conflict with regular
> >> WAL files?
>
> > Yes. But we rename RECOVERYXLOG to 000000010000000000000057 or similar
> > at the end of recovery, in exitArchiveRecovery().
>
> > Thinking about this some more, I think we should've changed
> > exitArchiveRecovery() rather than RemoveOldXlogFiles(): it would be more
> > robust if exitArchiveRecovery() always copied the last WAL file rather
> > than just renamed it. It doesn't seem safe to rely on the file the
> > symlink points to to be valid after recovery is finished, and we might
> > write to it before it's recycled, so the current fix isn't complete.
>
> Hmm. I think really the reason it's coded that way is that we assumed
> the recovery command would be physically copying the file from someplace
> else. pg_standby is violating the backend's expectations by using a
> symlink. And I really doubt that the technique is saving anything, since
> the data has to be read in from the archive location anyway.
>
> I'm leaning back to the position that pg_standby's -l option is simply a
> bad idea and should be removed.

ISTM we didn't clearly state what the recovery_command should do either
way. Even if you remove the pg_standby option that will not fix the
problem for people who have written their own script, or existing users
of pg_standby. The safe way is to do as Heikki suggests and copy the
final file into place and I would add that we must then fsync it also.
That should be back-patched possibly as far as 8.0. Documenting a change
is not nearly enough.

Removing -l is a separate discussion. If there was a consensus against
it, I would suggest that we deprecate the option, so that it does
nothing.

As an aside, I would be also much more comfortable if there was an
option to not recycle the WAL files at all, as a safe mode for error
checking at least. The question is: why do we need to zero fill the file
first anyway? We could save the 8 bytes for the prev pointer on every
record if we added enough zeroes onto every WAL write to zap any
pre-existing header data, causing recovery to fail.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2009-06-04 09:01:28 Re: pg_standby -l might destory the archived file
Previous Message Simon Riggs 2009-06-04 08:38:12 Re: [COMMITTERS] pgsql: Fix LOCK TABLE to eliminate the race condition that could make it