Re: [GENERAL] 8.1.4 - problem with PITR - .backup.done /

From: Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] 8.1.4 - problem with PITR - .backup.done /
Date: 2006-05-30 21:01:32
Message-ID: 1149022892.980.60.camel@linux.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Tue, 2006-05-30 at 15:38 -0400, Tom Lane wrote:
[.......]
>
> My thought is that the stat()s on the .done file failed for some obscure
> reason, perhaps insufficient kernel resources, even though the file was
> actually there.
>
> If you have postmaster log output for the interval in which this
> happened, it would be interesting to look for occurrences of this
> warning message from pgarch_archiveDone:
>
> if (rename(rlogready, rlogdone) < 0)
> ereport(WARNING,
> (errcode_for_file_access(),
> errmsg("could not rename file \"%s\" to \"%s\": %m",
> rlogready, rlogdone)));
>
> If you find any then we might need a different theory ...
>

I do not find any warning message "could not rename file ...". These are
the relevant entries in the log file:

--------------------------------------------------------
[2006-05-29 17:31:55.212 CEST] 12022 LOG: archived transaction log
file "00000001000000080000000F"

**** PITR_basebackup script started around 17:32 ****

[2006-05-29 17:40:27.735 CEST] 12022 LOG: archived transaction log
file "000000010000000800000010"
[2006-05-29 17:49:32.075 CEST] 12022 LOG: archived transaction log
file "000000010000000800000011"
[2006-05-29 17:59:40.575 CEST] 12022 LOG: archived transaction log
file "000000010000000800000012"
[2006-05-29 18:08:27.229 CEST] 12022 LOG: archived transaction log
file "000000010000000800000013"
[2006-05-29 18:11:36.434 CEST] 12022 LOG: archived transaction log
file "000000010000000800000010.0006D5E8.backup"

[2006-05-29 18:11:36.467 CEST] 12022 LOG: archive command
"archive_wal.sh -P pg_xlog/000000010000000800000010.0006D5E8.backup -F
000000010000000800000010.0006D5E8.backup" failed: return code 256

[2006-05-29 18:11:37.479 CEST] 12022 LOG: archive command
"archive_wal.sh -P pg_xlog/000000010000000800000010.0006D5E8.backup -F
000000010000000800000010.0006D5E8.backup" failed: return code 256

[2006-05-29 18:11:38.492 CEST] 12022 LOG: archive command
"archive_wal.sh -P pg_xlog/000000010000000800000010.0006D5E8.backup -F
000000010000000800000010.0006D5E8.backup" failed: return code 256

[2006-05-29 18:11:38.492 CEST] 12022 WARNING: transaction log file
"000000010000000800000010.0006D5E8.backup" could not be archived: too
many failures

**** PITR_basebackup script finnished 18:12:16 ****
...............................
**** Same error several times until we deleted the .backup.ready file at
18:15 ****

[2006-05-29 18:19:14.546 CEST] 12022 LOG: archived transaction log
file "000000010000000800000014"
[2006-05-29 18:30:10.939 CEST] 12022 LOG: archived transaction log
file "000000010000000800000015"
...............................
--------------------------------------------------------

Our PITR_basebackup script does this:

* Checks if Backup label file exists
* Starts Backup process with pg_start_backup()
* Creates a LVM2 Snapshot of data partition
* Mounts the Snapshot partition
* Creates a tar.bz2 file of data
* Umounts Snapshot partition
* Removes Snapshot LV
* Backup last WAL file not yet archived
* Stops Backup process with pg_stop_backup()
* Waits for *.backup file to appear under the archivedir
* Removes old WAL archived files
* Removes old tar.bz2 data file

To create the tar.bz file and to delete old WAL files can take some
time. The total running time of the PITR_basebackup script was 2412 sec.

If we get the same problem again, I will try to get more information
from the system. As I said in my last e-mail, this has been a one time
problem.

regards,
--
Rafael Martinez, <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
Center for Information Technology Services
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Thomas Kellerer 2006-05-30 21:05:18 Re: Best open source tool for database design / ERDs?
Previous Message Bruno Wolff III 2006-05-30 20:52:21 Re: Restoring databases from a different installment on Windows

Browse pgsql-hackers by date

  From Date Subject
Next Message Martijn van Oosterhout 2006-05-30 21:16:09 Re: anoncvs still slow
Previous Message Andrew Dunstan 2006-05-30 20:52:09 Re: Looking for Postgres Developers to fix problem