Skip site navigation (1) Skip section navigation (2)

Re: BUG #3110: Online Backup introduces Duplicate OIDs

From: Randy Isbell <jisbell(at)cisco(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #3110: Online Backup introduces Duplicate OIDs
Date: 2007-03-13 14:03:26
Message-ID: 5F2CCCB7-B2C5-4CA9-852F-CBD05251B80F@cisco.com (view raw or flat)
Thread:
Lists: pgsql-bugs
On Mar 12, 2007, at 8:08 PM, Tom Lane wrote:

> Randy Isbell <jisbell(at)cisco(dot)com> writes:
>> One thing I do find interesting:  while the backup created with my
>> "online" / Chap 23.3 processing contains 18 WAL files, only the last
>> 3 (most recent) are used during the recovery process.  Is that  
>> normal?
>
> Hmm, no, that seems pretty suspicious.  The thing *should* be starting
> from the checkpoint created by pg_start_backup(), but that observation
> makes it sound like it is starting from some later one, probably
> whatever happened to be current when pg_control got archived.  Please
> check this by noting what the startup messages say about where replay
> commences, and compare to the backup label file (plain text, you can
> just look at it) and pg_control (use pg_controldata before starting  
> the
> replay).

Here is the pg_controldata output for the saved cluster prior to  
startup:

pg_control version number:            822
Catalog version number:               200611241
Database system identifier:           5038266934557908351
Database cluster state:               in production
pg_control last modified:             Tue Mar  6 12:43:53 2007
Current log file ID:                  2
Next log file segment:                6
Latest checkpoint location:           2/3035F48
Prior checkpoint location:            2/292EC
Latest checkpoint's REDO location:    2/3000840
Latest checkpoint's UNDO location:    0/0
Latest checkpoint's TimeLineID:       1
Latest checkpoint's NextXID:          0/18823426
Latest checkpoint's NextOID:          8774996
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Time of latest checkpoint:            Tue Mar  6 12:41:41 2007
Minimum recovery ending location:     0/0
Maximum data alignment:               4
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Date/time type storage:               64-bit integers
Maximum length of locale name:        128
LC_COLLATE:                           C
LC_CTYPE:                             C


Here is the backup label file:
# cat 0000000100000001000000F3.00FB4C3C.backup
START WAL LOCATION: 1/F3FB4C3C (file 0000000100000001000000F3)
STOP WAL LOCATION: 2/56912D8 (file 000000010000000200000005)
CHECKPOINT LOCATION: 1/F3FB4C3C
START TIME: 2007-03-06 12:25:05 CST
LABEL: /usr/local/bck/db/sn200703061224.tar.gz
STOP TIME: 2007-03-06 12:44:27 CST


Here are the startup messages:

<2007-03-12 10:39:10 CDT 82475>LOG:  could not create socket for  
statistics collector: Protocol not supported
<2007-03-12 10:39:10 CDT 82475>LOG:  trying another address for the  
statistics collector
<2007-03-12 10:39:10 CDT 82889>LOG:  database system was interrupted  
at 2007-03-06 12:43:53 CST
<2007-03-12 10:39:10 CDT 82889>LOG:  starting archive recovery
<2007-03-12 10:39:10 CDT 82889>LOG:  restore_command = "cp /usr/local/ 
pgsql/walarch/%f %p"
cp: /usr/local/pgsql/walarch/00000001.history: No such file or directory
<2007-03-12 10:39:10 CDT 82889>LOG:  restored log file  
"000000010000000200000003" from archive
<2007-03-12 10:39:10 CDT 82889>LOG:  checkpoint record is at 2/3035F48
<2007-03-12 10:39:10 CDT 82889>LOG:  redo record is at 2/3000840;  
undo record is at 0/0; shutdown FALSE
<2007-03-12 10:39:10 CDT 82889>LOG:  next transaction ID: 0/18823426;  
next OID: 8774996
<2007-03-12 10:39:10 CDT 82889>LOG:  next MultiXactId: 1; next  
MultiXactOffset: 0
<2007-03-12 10:39:10 CDT 82889>LOG:  automatic recovery in progress
<2007-03-12 10:39:10 CDT 82889>LOG:  redo starts at 2/3000840
<2007-03-12 10:39:10 CDT 82889>LOG:  REDO @ 2/3000840; LSN 2/30024C0:  
prev 2/3000818; xid 18823427; bkpb1: Heap - insert: rel  
1663/16390/22670; tid 211239/23
<2007-03-12 10:39:10 CDT 82889>LOG:  REDO @ 2/30024C0; LSN 2/3004514:  
prev 2/3000840; xid 18823427; bkpb1: Btree - insert: rel  
1663/16390/8358831; tid 9103/77
<2007-03-12 10:39:10 CDT 82889>LOG:  REDO @ 2/3004514; LSN 2/3005E9C:  
prev 2/30024C0; xid 18823427; bkpb1: Btree - insert: rel  
1663/16390/8359008; tid 36219/100
<2007-03-12 10:39:10 CDT 82889>LOG:  REDO @ 2/3005E9C; LSN 2/3007958:  
prev 2/3004514; xid 18823427; bkpb1: Btree - insert: rel  
1663/16390/8359009; tid 15905/133
<2007-03-12 10:39:10 CDT 82889>LOG:  REDO @ 2/3007958; LSN 2/3008AAC:  
prev 2/3005E9C; xid 18823427; bkpb1: Btree - insert: rel  
1663/16390/8359011; tid 13711/14
<2007-03-12 10:39:10 CDT 82889>LOG:  REDO @ 2/3008AAC; LSN 2/3009D98:  
prev 2/3007958; xid 18823427; bkpb1: Btree - insert: rel  
1663/16390/8359012; tid 15516/235
<2007-03-12 10:39:10 CDT 82889>LOG:  REDO @ 2/3009D98; LSN 2/300B0D4:  
prev 2/3008AAC; xid 18823427; bkpb1: Btree - insert: rel  
1663/16390/8359024; tid 12394/300
<2007-03-12 10:39:10 CDT 82889>LOG:  REDO @ 2/300B0D4; LSN 2/300B0FC:  
prev 2/3009D98; xid 18823427: Transaction - commit: 2007-03-06 12:41:41


 From the backup label file it seems the starting checkpoint record  
should be at 1/F3FB4C3C.  However the pg_control info indicates  
2/3035F48.  Which is correct?

In response to

Responses

pgsql-bugs by date

Next:From: Tom LaneDate: 2007-03-13 15:40:53
Subject: Re: BUG #3110: Online Backup introduces Duplicate OIDs
Previous:From: Алексей ОпанасенкоDate: 2007-03-13 13:09:55
Subject: Installation bug

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group