Re: BUG #3110: Online Backup introduces Duplicate OIDs

From: Randy Isbell <jisbell(at)cisco(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #3110: Online Backup introduces Duplicate OIDs
Date: 2007-03-12 18:46:35
Message-ID: 90F94732-716D-4D27-B219-073EEE81CE13@cisco.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


On Mar 10, 2007, at 5:02 PM, Tom Lane wrote:

> "Randy Isbell \(jisbell\)" <jisbell(at)cisco(dot)com> writes:
>> I found that if I take an offline backup created around the same
>> time as
>> my online backup, roll forward the transaction log files included
>> in the
>> offline backup using a recovery.conf file, the duplicate records
>> do NOT
>> exist.
>> Therefore it seems there is no corruption in the WAL files. The
>> problem
>> must be in the PITR processing of the online backup file.
>
> ... or there's something wrong with your backup procedure.
>
> I hadn't looked closely at that point before, but I see you
> describe it
> as
>
>> 3. Issue pg_start_backup()
>> 4. Save off the data cluster
>> 5. Issue pg_stop_backup()
>> 6. Collect the WAL files
>> 7. Create a big hairy tar file with the stuff from items 4 and 6.
>> 8. Take the big hairy tar file to another server running the same pg
>> 8.2.3, untar and start postgres
>
> AFAICS this procedure is *not* suggested anywhere in our
> documentation.

My description is abbreviated, but the process follows exactly that
which is documented in Chapter 23.3 "Continuous Archiving and Point-
In-Time Recovery". I use the term "online backup" to refer to this
process.

> What's bothering me about it is that I don't see anything guaranteeing
> that you have a full set of WAL files back to pg_start_backup(). If
> checkpoints occur during step 4, as is virtually certain given you say
> step 4 takes 20 minutes, then WAL files you need will get
> renamed/recycled. What are you doing to "collect the WAL files"
> exactly?
>

"archive_command" is defined in postgresql.conf in order to save the
WAL files to an archive location. The start and end WAL file names
are obtained from the backup history file. All WAL files between the
first and last inclusive are saved with the data cluster snapshot.
In any event, starting postgres with this saved cluster initiates the
recovery process and postgres does come up ready.

One thing I do find interesting: while the backup created with my
"online" / Chap 23.3 processing contains 18 WAL files, only the last
3 (most recent) are used during the recovery process. Is that normal?

> Also, what do you consider to be an "offline backup", and what's
> different in your process for creating that?

My term "offline backup" refers to the backup process documented in
Chapter 23.2 "File System Level Backup". In this backup postgres is
shutdown and a snapshot of the data cluster is taken.

It seems odd that I'm the only person having this problem. My
database is not that big (10MM tuples in 100 tables or so). So I
question my processing or the environment. After reviewing the
process I find my backup procedure matches the documentation. Could
this be FreeBSD 6.1 or the filesystem? How many folks are using
FreeBSD 6.1 and UFS2?

- r.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Magnus Hagander 2007-03-12 19:00:22 Re: BUG #3131: on win pg_connect() is very slow
Previous Message Magnus Hagander 2007-03-12 16:50:58 BUG #3147: Test bug report