Re: BUG #3110: Online Backup introduces Duplicate OIDs

From: Randy Isbell <jisbell(at)cisco(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #3110: Online Backup introduces Duplicate OIDs
Date: 2007-03-13 14:03:26
Message-ID: 5F2CCCB7-B2C5-4CA9-852F-CBD05251B80F@cisco.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


On Mar 12, 2007, at 8:08 PM, Tom Lane wrote:

> Randy Isbell <jisbell(at)cisco(dot)com> writes:
>> One thing I do find interesting: while the backup created with my
>> "online" / Chap 23.3 processing contains 18 WAL files, only the last
>> 3 (most recent) are used during the recovery process. Is that
>> normal?
>
> Hmm, no, that seems pretty suspicious. The thing *should* be starting
> from the checkpoint created by pg_start_backup(), but that observation
> makes it sound like it is starting from some later one, probably
> whatever happened to be current when pg_control got archived. Please
> check this by noting what the startup messages say about where replay
> commences, and compare to the backup label file (plain text, you can
> just look at it) and pg_control (use pg_controldata before starting
> the
> replay).

Here is the pg_controldata output for the saved cluster prior to
startup:

pg_control version number: 822
Catalog version number: 200611241
Database system identifier: 5038266934557908351
Database cluster state: in production
pg_control last modified: Tue Mar 6 12:43:53 2007
Current log file ID: 2
Next log file segment: 6
Latest checkpoint location: 2/3035F48
Prior checkpoint location: 2/292EC
Latest checkpoint's REDO location: 2/3000840
Latest checkpoint's UNDO location: 0/0
Latest checkpoint's TimeLineID: 1
Latest checkpoint's NextXID: 0/18823426
Latest checkpoint's NextOID: 8774996
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Time of latest checkpoint: Tue Mar 6 12:41:41 2007
Minimum recovery ending location: 0/0
Maximum data alignment: 4
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Date/time type storage: 64-bit integers
Maximum length of locale name: 128
LC_COLLATE: C
LC_CTYPE: C

Here is the backup label file:
# cat 0000000100000001000000F3.00FB4C3C.backup
START WAL LOCATION: 1/F3FB4C3C (file 0000000100000001000000F3)
STOP WAL LOCATION: 2/56912D8 (file 000000010000000200000005)
CHECKPOINT LOCATION: 1/F3FB4C3C
START TIME: 2007-03-06 12:25:05 CST
LABEL: /usr/local/bck/db/sn200703061224.tar.gz
STOP TIME: 2007-03-06 12:44:27 CST

Here are the startup messages:

<2007-03-12 10:39:10 CDT 82475>LOG: could not create socket for
statistics collector: Protocol not supported
<2007-03-12 10:39:10 CDT 82475>LOG: trying another address for the
statistics collector
<2007-03-12 10:39:10 CDT 82889>LOG: database system was interrupted
at 2007-03-06 12:43:53 CST
<2007-03-12 10:39:10 CDT 82889>LOG: starting archive recovery
<2007-03-12 10:39:10 CDT 82889>LOG: restore_command = "cp /usr/local/
pgsql/walarch/%f %p"
cp: /usr/local/pgsql/walarch/00000001.history: No such file or directory
<2007-03-12 10:39:10 CDT 82889>LOG: restored log file
"000000010000000200000003" from archive
<2007-03-12 10:39:10 CDT 82889>LOG: checkpoint record is at 2/3035F48
<2007-03-12 10:39:10 CDT 82889>LOG: redo record is at 2/3000840;
undo record is at 0/0; shutdown FALSE
<2007-03-12 10:39:10 CDT 82889>LOG: next transaction ID: 0/18823426;
next OID: 8774996
<2007-03-12 10:39:10 CDT 82889>LOG: next MultiXactId: 1; next
MultiXactOffset: 0
<2007-03-12 10:39:10 CDT 82889>LOG: automatic recovery in progress
<2007-03-12 10:39:10 CDT 82889>LOG: redo starts at 2/3000840
<2007-03-12 10:39:10 CDT 82889>LOG: REDO @ 2/3000840; LSN 2/30024C0:
prev 2/3000818; xid 18823427; bkpb1: Heap - insert: rel
1663/16390/22670; tid 211239/23
<2007-03-12 10:39:10 CDT 82889>LOG: REDO @ 2/30024C0; LSN 2/3004514:
prev 2/3000840; xid 18823427; bkpb1: Btree - insert: rel
1663/16390/8358831; tid 9103/77
<2007-03-12 10:39:10 CDT 82889>LOG: REDO @ 2/3004514; LSN 2/3005E9C:
prev 2/30024C0; xid 18823427; bkpb1: Btree - insert: rel
1663/16390/8359008; tid 36219/100
<2007-03-12 10:39:10 CDT 82889>LOG: REDO @ 2/3005E9C; LSN 2/3007958:
prev 2/3004514; xid 18823427; bkpb1: Btree - insert: rel
1663/16390/8359009; tid 15905/133
<2007-03-12 10:39:10 CDT 82889>LOG: REDO @ 2/3007958; LSN 2/3008AAC:
prev 2/3005E9C; xid 18823427; bkpb1: Btree - insert: rel
1663/16390/8359011; tid 13711/14
<2007-03-12 10:39:10 CDT 82889>LOG: REDO @ 2/3008AAC; LSN 2/3009D98:
prev 2/3007958; xid 18823427; bkpb1: Btree - insert: rel
1663/16390/8359012; tid 15516/235
<2007-03-12 10:39:10 CDT 82889>LOG: REDO @ 2/3009D98; LSN 2/300B0D4:
prev 2/3008AAC; xid 18823427; bkpb1: Btree - insert: rel
1663/16390/8359024; tid 12394/300
<2007-03-12 10:39:10 CDT 82889>LOG: REDO @ 2/300B0D4; LSN 2/300B0FC:
prev 2/3009D98; xid 18823427: Transaction - commit: 2007-03-06 12:41:41

From the backup label file it seems the starting checkpoint record
should be at 1/F3FB4C3C. However the pg_control info indicates
2/3035F48. Which is correct?

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2007-03-13 15:40:53 Re: BUG #3110: Online Backup introduces Duplicate OIDs
Previous Message Алексей Опанасенко 2007-03-13 13:09:55 Installation bug