Re: problem with archive_command as suggested by documentation

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: problem with archive_command as suggested by documentation
Date: 2009-01-22 18:36:06
Message-ID: 4978BC96.6090404@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Albe Laurenz wrote:
> The documentation states in
> http://www.postgresql.org/docs/current/static/continuous-archiving.html#BACKUP-ARCHIVING-WAL
>
> "The archive command should generally be designed to refuse to overwrite any pre-existing archive file."
>
> and suggests an archive_command like "test ! -f .../%f && cp %p .../%f".
>
> We ran into (small) problems with an archive_command similar to this
> as follows:
>
> The server received a fast shutdown request while a WAL segment was being archived.
> The archiver stopped and left behind a half-written archive file.

Hmm, if I'm reading the code correctly, a fast shutdown request
shouldn't kill an ongoing archive command.

> Now when the server was restarted, the archiver tried to archive the same
> WAL segment again and got an error because the destination file already
> existed.
>
> That means that WAL archiving is stuck until somebody manually removes
> the partial archived file.

Yeah, that's a good point. Even if it turns out that the reason for your
partial write wasn't the fast shutdown request, the archive_command
could be interrupted for some other reason and leave behind a partially
written file behind.

> I suggest that the documentation be changed so that it does not
> recommend this setup. WAL segment names are unique anyway.

Well, the documentation states the reason to do that:

> This is an important safety feature to preserve the integrity of your archive in case of administrator error (such as sending the output of two different servers to the same archive directory)

which seems like a reasonable concern too. Perhaps it should suggest
something like:

test ! -f .../%f && cp %p .../%f.tmp && mv .../%f.tmp .../%f

ie. copy under a different filename first, and rename the file in place
after it's completely written, assuming that mv is atomic. It gets a bit
complicated, though.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2009-01-22 19:04:45 Re: Visibility map and freezing
Previous Message decibel 2009-01-22 18:16:39 Re: problem with archive_command as suggested by documentation