Re: BUG #3110: Online Backup introduces Duplicate OIDs

From: Randy Isbell <jisbell(at)cisco(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #3110: Online Backup introduces Duplicate OIDs
Date: 2007-03-08 20:54:40
Message-ID: 4AF85E85-FA73-400E-9589-9BFCD88F62A7@cisco.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Answers inline.

On Mar 7, 2007, at 3:54 PM, Tom Lane wrote:

> Randy Isbell <jisbell(at)cisco(dot)com> writes:
>> Here you go:
>> SELECT
>> ctid,xmin,xmax,cmin,cmax,oid,*
>
> Thanks. This is real interesting, because none of the rows have
> xmax/cmax set, so it doesn't appear that they were meant to have been
> updated out of existence.
>
>> For the at_dns table, it appears one column (ac_soa_serial) changes.
>
> Does that correspond to something your application does, ie UPDATE
> ac_soa_serial to a new value without changing anything else?

Yes. In fact, the app may perform an update even when no columns
change.

> I'm trying
> to guess if the duplicates arose by means of a misfiring UPDATE, or if
> they were independent insertions. Is it plausible that two rows that
> are the same except for ac_soa_serial would be inserted by your app?

Very unlikely. I have the requirement to keep an audit trail of all
changes made to the database. This is done by triggers on each table
which update a corresponding mirror table in an audit schema. There
is a trigger for each insert, update, and delete. I'm using this
audit information to try and isolate the problem. What I found is
that no duplication ever exists for INSERTs, only UPDATEs and DELETEs.

Also, recall my previous note that the duplication happens near the
time of the pg_start_backup() and pg_stop_backup(). Based on the
audit schema information I collected, I see numerous updates and
deletes happen with no duplication problems when they are in the
middle of the backup time. In my environment the duplication happens
within 2 minutes of the start or stop. This may be incidental, but
I've seen it on 8 of 10 backup/restore runs.

> If the latter, a possible theory is that the OID counter is somehow
> being rolled back by the dump/reload process.

The reload process is simply: start postgres and let it replay the
necessary WAL files.

Is there a way to determine if the WAL file data is bad? It would
be helpful to know if the problem is caused by the backup, or if
something is wrong in the replay of the WAL files.

>
> regards, tom lane

- r.

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Balaji.S 2007-03-09 07:00:18 BUG #3126: Kernel audit Problem
Previous Message Randy Isbell 2007-03-08 17:06:36 Re: {BUGS] BUG #3110: Online Backup introduces Duplicate OIDs