Recurring corrupted page pointer panics on 9.4.4 hot-standby replica

From: Michael Robinson <michael(at)snupps(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Recurring corrupted page pointer panics on 9.4.4 hot-standby replica
Date: 2015-10-26 11:05:27
Message-ID: CAA7ibA=BeQQz=yzjdaYMD9fesy-cEpgpn9Eg+ow-TdH-LgeV8A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

Two days ago, we started getting panics on a hot-standby replica as follows:

2015-10-24 14:16:46.489 UTC PANIC: corrupted page pointers: lower = 17,
> upper = 0, special = 8176
> 2015-10-24 14:16:46.490 UTC CONTEXT: xlog redo unlink_page: rel
> 1663/16416/254063; dead 11796080; left 1365037; right 3024097; btpo_xact
> 64542957; leaf 2456241; leafleft 11130443; leafright 1350594; topparent
> 4294967295
> 2015-10-26 04:51:40.530 UTC PANIC: corrupted page pointers: lower = 17,
> upper = 0, special = 8176
> 2015-10-26 04:51:40.530 UTC CONTEXT: xlog redo unlink_page: rel
> 1663/16416/254063; dead 9922828; left 2449142; right 3415026; btpo_xact
> 64982371; leaf 2290440; leafleft 5120238; leafright 1903321; topparent
> 4294967295
> 2015-10-26 10:24:02.613 UTC PANIC: corrupted page pointers: lower = 17,
> upper = 0, special = 8176
> 2015-10-26 10:24:02.613 UTC CONTEXT: xlog redo unlink_page: rel
> 1663/16416/401628; dead 2348571; left 2348281; right 2351431; btpo_xact
> 65010718; leaf 2348740; leafleft 2348434; leafright 2351568; topparent
> 4294967295

The replica is running on a dedicated EC2 instance, and has been running
without any problems for several months. The build version is
9.4.4-1.pgdg14.04+1 from the apt repository, running on Ubuntu 14.04
Trusty. The database is around 440GB, and is under constant moderate
read-only load (100-1000 queries per second).

There have been no issues with the master database, nor have there been any
database shutdowns other than the panics.

I would be very grateful for any insights as to what may have caused this,
and how best to recover stable operation.

Best regards,
Michael Robinson

Browse pgsql-general by date

  From Date Subject
Next Message Lasse Westh-Nielsen 2015-10-26 11:21:23 Service not starting on Ubuntu 15.04
Previous Message dinesh kumar 2015-10-26 10:46:37 Re: function null composite behavior