Quick Links

Data corruption after SAN snapshot

From:	Terry Schmitt <tschmitt(at)schmittworks(dot)com>
To:	pgsql-admin(at)postgresql(dot)org
Subject:	Data corruption after SAN snapshot
Date:	2012-08-07 22:23:33
Message-ID:	CAOOcyswLYBfJDuvNBPMkiNCGNKgK=SiexUuTVHCh2O+Y1T-sLw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-admin

Hi All,

I have a pretty strange issue that I'm looking for ideas on.
I'm using Postgres Plus Advanced Server 9.1, but I believe this problem is
relevant to Postgres Community. It is certainly possible to be a EDB bug
and I am already working with them on this.

We are migrating 1TB+ from Oracle to PPAS. Our new environment consists of
a primary server with two "read-only" clones. We use NetApps SAN storage
and execute a NetApps consistent snapshot on the primary server and then
use flex clones for the read-only servers. The clones power up, do a short
recovery and all should be well. We have been doing this method for two
years except using PPAS 8.4 and physical servers and ext4.
The new environment is RHEL 6.x guests running inside Redhat Virtualization
using XFS and LVM.

The problem is that after the data load, we take a warm snapshot and the
cloned database are coming up corrupt.
A classic example is: ERROR: could not read block 1 in file
"base/18511/13872": read only 0 of 8192 bytes. Looking at the data file, it
is 8k in size, so obviously we are missing block 1 from the file. So far I
identified indexes and sequences as corrupt, but I believe it could be any
object.
Since the snapshot is essentially a crash, this system is not crash
resistant either.

Looking through the timeline of events, it is clear that data exists in RAM
on the primary server, but is not being written out to the SAN for the
snapshot and hence is missing when the clone starts up. My first thought is
that fsync is not working. PPAS has fsync on and is using fdatasync.

I run a rudimentary test using: dd if=/dev/zero of=dd_test2 bs=8k
count=128k conv=fdatasync and crash the server immediate after dd completes.
Everything behaves as expected. with fsync or fdatasync, the file exists
after the crash and reboot. Leaving out the sync results in a missing file
after the crash/reboot, but that is expected. This simple tests shows that
fdatasync is working, but does not prove this under load.

So, at this point, I don't know if the fdatasync is being issued, but not
honored by the OS stack, or if PPAS is even issuing the sync at all.

Anyone have a solid method to test if fdatasync is working correctly or
thoughts on troubleshooting this? It is extremely time consuming to
replicate the problem, but even then the corruption moves around, so it's
hard to know immediately if there is corruption at all. I'm hoping to
utilize a tool set outside of Postgres to positively eliminate the OS stack.

Sorry for the lengthy post, but hopefully it's clear what is going on.

Thanks!
T

Responses

Re: Data corruption after SAN snapshot at 2012-08-07 23:01:09 from Simon Riggs
Re: Data corruption after SAN snapshot at 2012-08-08 01:11:02 from Craig Ringer
Re: Data corruption after SAN snapshot at 2012-08-08 01:34:25 from Stephen Frost

Browse pgsql-admin by date

	From	Date	Subject
Next Message	Simon Riggs	2012-08-07 23:01:09	Re: Data corruption after SAN snapshot
Previous Message	Anibal David Acosta	2012-08-06 13:49:06	Re: Timeout error on pgstat