Re: Problem with pg_compresslog'd archives

From: Karl Denninger <karl(at)denninger(dot)net>
To: Koichi Suzuki <koichi(dot)szk(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Problem with pg_compresslog'd archives
Date: 2010-02-11 13:48:23
Message-ID: 4B740AA7.6050507@denninger.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Will this come through as a commit on the pgfoundry codebase? I've
subscribed looking for it....

The last edit, if I read the release notes and tracebacks on the
codebase correctly, goes back to the early part of 2009 - which strongly
implies that there are a **LOT** of people out there that could be
running this code with un-restoreable archives!

That, for obvious reasons, could be VERY, VERY bad if someone was to
suffer a system crash....

Koichi Suzuki wrote:
> I found it's pg_compresslog problem (calculation of XNOOP record
> length used in pg_decompresslog). I'm fixing the bug and will
> upload the fix shortly.
>
> Sorry for inconvenience.
>
> ------------------
> Koichi Suzuki
>
> 2010/2/8 Karl Denninger <karl(at)denninger(dot)net>:
>
>> This may belong in a bug report, but I'll post it here first...
>>
>> There appears to be a **SERIOUS** problem with using pg_compresslog and
>> pg_uncompresslog with Postgresql 8.4.2.
>>
>> Here's my configuration snippet:
>>
>> full_page_writes = on # recover from partial page writes
>> wal_buffers = 256kB # min 32kB
>> # (change requires restart)
>> #wal_writer_delay = 200ms # 1-10000 milliseconds
>>
>> #commit_delay = 0 # range 0-100000, in microseconds
>> #commit_siblings = 5 # range 1-1000
>>
>> # - Checkpoints -
>>
>> checkpoint_segments = 64 # in logfile segments, min 1,
>> 16MB each
>> #checkpoint_timeout = 5min # range 30s-1h
>> checkpoint_completion_target = 0.9 # checkpoint target duration,
>> 0.0 - 1.0
>> #checkpoint_warning = 30s # 0 disables
>>
>> archive_command = 'test ! -f /dbms/pg_archive/%f.bz2 && pg_compresslog
>> %p | bzip2 - >/dbms/pg_archive/%f.bz2' #command to use to
>> archive a logfile segment
>>
>> All appears to be fine with the writes, and they are being saved off on
>> the nightly backups without incident.
>>
>> I take a full dump using the instructions in the documentation and make
>> sure I copy the proper "must have" file for consistency to be reached.
>>
>> The problem comes when I try to restore.
>>
>> recovery_conf contains:
>>
>> restore_command = '/usr/local/pgsql/recovery.sh %f %p'
>>
>> And that file contains:
>>
>>
>> #! /bin/sh
>>
>> infile=$1
>> outfile=$2
>>
>> if test -f /dbms/pg_archive/$infile.bz2
>> then
>> bunzip2 -c /dbms/pg_archive/$infile.bz2 |
>> /usr/local/pgsql/bin/pg_decompresslog - $outfile
>> exit 0
>> else
>> exit 1
>> fi
>>
>> ==============
>>
>> The problem is that it appears that some of the segments being saved are
>> no good! On occasion I get this when trying to restore...
>>
>> Feb 7 12:43:51 dbms2 postgres[2001]: [210-1] LOG: restored log file
>> "00000001000001710000009A" from archive
>> Feb 7 12:43:52 dbms2 postgres[2001]: [211-1] LOG: restored log file
>> "00000001000001710000009B" from archive
>> Feb 7 12:43:52 dbms2 postgres[2001]: [212-1] LOG: restored log file
>> "00000001000001710000009C" from archive
>> Feb 7 12:43:52 dbms2 postgres[2001]: [213-1] LOG: restored log file
>> "00000001000001710000009D" from archive
>> Feb 7 12:43:53 dbms2 postgres[2001]: [214-1] LOG: restored log file
>> "00000001000001710000009E" from archive
>> Feb 7 12:43:53 dbms2 postgres[2001]: [215-1] LOG: restored log file
>> "00000001000001710000009F" from archive
>> Feb 7 12:43:54 dbms2 postgres[2001]: [216-1] LOG: restored log file
>> "0000000100000171000000A0" from archive
>> Feb 7 12:43:54 dbms2 postgres[2001]: [217-1] LOG: restored log file
>> "0000000100000171000000A1" from archive
>> Feb 7 12:43:55 dbms2 postgres[2001]: [218-1] LOG: restored log file
>> "0000000100000171000000A2" from archive
>> Feb 7 12:43:55 dbms2 postgres[2001]: [219-1] LOG: restored log file
>> "0000000100000171000000A3" from archive
>> Feb 7 12:43:56 dbms2 postgres[2001]: [220-1] LOG: restored log file
>> "0000000100000171000000A4" from archive
>> Feb 7 12:43:56 dbms2 postgres[2001]: [221-1] LOG: restored log file
>> "0000000100000171000000A5" from archive
>> Feb 7 12:43:57 dbms2 postgres[2001]: [222-1] LOG: restored log file
>> "0000000100000171000000A6" from archive
>> Feb 7 12:43:57 dbms2 postgres[2001]: [223-1] PANIC: corrupted page
>> pointers: lower = 772, upper = 616, special = 0
>> Feb 7 12:43:57 dbms2 postgres[2001]: [223-2] CONTEXT: xlog redo
>> hot_update: rel 1663/616245/1193269; tid 53/93; new 53/4
>> Feb 7 12:43:57 dbms2 postgres[2000]: [1-1] LOG: startup process (PID
>> 2001) was terminated by signal 6: Abort trap
>> Feb 7 12:43:57 dbms2 postgres[2000]: [2-1] LOG: terminating any other
>> active server processes
>>
>> Eek.
>>
>> I assume this means that either A6 or A7 is corrupt. But I have the
>> file both in the restore AND ON THE MACHINE WHERE IT ORIGINATED:
>>
>> On the SOURCE machine (which is running just fine):
>> tickerforum# cksum *171*A[67]*
>> 172998591 830621 0000000100000171000000A6.bz2
>> 1283345296 1541006 0000000100000171000000A7.bz2
>>
>> And off the BACKUP archive, which is what I'm trying to restore:
>>
>> # cksum *171*A[67]*
>> 172998591 830621 0000000100000171000000A6.bz2
>> 1283345296 1541006 0000000100000171000000A7.bz2
>>
>> Identical, says the checksums.
>>
>> This is VERY BAD - if pg_compresslog is damaging the files in some
>> instances then ANY BACKUP TAKEN USING THEM IS SUSPECT AND MAY NOT
>> RESTORE!!!!!!
>>
>> Needless to say this is a MAJOR problem.
>>
>> -- Karl Denninger
>>
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>>
>>
>
>
>
>

Attachment Content-Type Size
karl.vcf text/x-vcard 124 bytes

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message A. Kretschmer 2010-02-11 13:53:23 Re: trouble with unique constraint
Previous Message Justin Graf 2010-02-11 13:47:15 Re: windows7 login- user account