Re: BUG #3110: Online Backup introduces Duplicate OIDs

From: Randy Isbell <jisbell(at)cisco(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #3110: Online Backup introduces Duplicate OIDs
Date: 2007-03-06 19:31:24
Message-ID: 0D84669C-A37B-435A-84CA-756EBFC34103@cisco.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Thanks for the reply.

This is not a pg_dump. I am performing an online backup ala the
"pg_start_backup()" and "pg_stop_backup()" mechanism.

Here is another look at the same table but with the OID values
included (this is from a different dump, not the initial bug report):

sn=# select oid, ac_zone,ac_host,ac_type,ac_data, count(oid) from
at_dns group by
oid,ac_zone,ac_host,ac_type,ac_data having count(oid) > 1;
oid | ac_zone | ac_host | ac_type
| ac_data | count
---------+-------------------------------+---------+---------
+------------------------+-------
7049453 | e164.lr0007.nqa5.l1.cisco.com | @ | soa |
ns1.nqa5.l1.cisco.com. | 2
7049454 | e164.nqa5.l1.cisco.com | @ | soa |
ns1.nqa5.l1.cisco.com. | 2
7049503 | e164.lr0008.nqa5.l1.cisco.com | @ | soa |
ns1.nqa5.l1.cisco.com. | 2
7049512 | e164.lr0006.nqa5.l1.cisco.com | @ | soa |
ns1.nqa5.l1.cisco.com. | 2
7049515 | e164.lr0005.nqa5.l1.cisco.com | @ | soa |
ns1.nqa5.l1.cisco.com. | 2
7049531 | e164.lr0009.nqa5.l1.cisco.com | @ | soa |
ns1.nqa5.l1.cisco.com. | 2
(6 rows)

So the scenerio is:

1. Verify that the original database is void of duplicate primary key
values. This is done be running queries similar to the above for
each primary key on all user tables.
2. Create a load on the database, about 30 transactions / sec.
3. Issue pg_start_backup()
4. Save off the data cluster
5. Issue pg_stop_backup()
6. Collect the WAL files
7. Create a big hairy tar file with the stuff from items 4 and 6.
8. Take the big hairy tar file to another server running the same pg
8.2.3, untar and start postgres
9. On the server from item 8, run the same queries from item 1.
Viola, behold the duplicates.

I've run a number of these scenerios and I've found some common
features which may be clues:

a. The duplicate records result from a DELETE or UPDATE query
b. They occur near the time of the pg_start_backup() or pg_stop_backup
(). That is, the DELETE or UPDATE query is issued within 90 to 110
seconds of either the start or stop, and this backup takes 20 min to
create.

Is there a transaction log file dump utility? It would be nice to
see if the corruption is actually in the WAL files.

Regards,
- r.

On Mar 6, 2007, at 11:43 AM, Tom Lane wrote:

> "Randy Isbell" <jisbell(at)cisco(dot)com> writes:
>> When restoring the output of an online backup, many tables now have
>> duplicate OID values / primary keys, viz:
>> ...
>> sn=# reindex table at_dns;
>> ERROR: could not create unique index
>
> I'm confused. You're claiming that the reload succeeds, but after
> that
> a reindex fails? Are there actually duplicate OIDs in the dump file,
> as evidenced by looking at it? What pg_dump options are you using
> (I suppose -o at least)? If using pg_restore, what options there?
>
> regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message ravishankar 2007-03-07 07:18:18 BUG #3120: relation "pg_catalog.pg_user"
Previous Message Tom Lane 2007-03-06 18:33:12 Re: BUG #3116: attribute has wrong type