On Tom Lane's advice, we upgraded to Postgres 8.0.8. We also upgraded
slony to 1.1.5, due to some rpm issues. Apart from that everything is
as described below.
We were able to corrupt the index within 90 minutes of starting up our
cluster. A slony-induced vacuum was under way on the provider at the
time the subscriber failed.
What can we do to help identify the cause of this? We have a test
system that seems able to reproduce this fairly easily.
On Wed, 2006-06-28 at 09:28 -0700, Marc Munro wrote:
> We have now experienced index corruption on two separate but identical
> slony clusters. In each case the slony subscriber failed after
> attempting to insert a duplicate record. In each case reindexing the
> sl_log_1 table on the provider fixed the problem.
> The latest occurrence was on our production cluster yesterday. This has
> only happened since we performed kernel upgrades and we are uncertain
> whether this represents a kernel bug, or a postgres bug exposed by
> different timings in the new kernel.
> Our systems are:
> Sun v40z 4 x Dual Core AMD Opteron(tm) Processor 875
> Kernel 18.104.22.168 #8 SMP x86_64 x86_64 x86_64 GNU/Linux
> kernel boot option: elevator=deadline
> 16 Gigs of RAM
> Bonded e1000/tg3 NICs with 8192 MTU.
> Slony 1.1.0
> NetApp FAS270 OnTap 7.0.3
> Mounted with the NFS options
> Jumbo frames 8192 MTU.
> All postgres data and logs are stored on the netapp.
> In the latest episode, the index corruption was coincident with a
> slony-induced vacuum. I don't know if this was the case with our test
> system failures.
In response to
pgsql-hackers by date
|Next:||From: Tom Lane||Date: 2006-06-29 16:11:08|
|Subject: Re: Index corruption |
|Previous:||From: Greg Stark||Date: 2006-06-29 15:54:43|
|Subject: Re: [GENERAL] UUID's as primary keys|