Streaming replication bug in 9.3.2, "WAL contains references to invalid pages"

From: Christophe Pettus <xof(at)thebuild(dot)com>
To: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Streaming replication bug in 9.3.2, "WAL contains references to invalid pages"
Date: 2014-01-02 19:59:29
Message-ID: 4D7FE288-4F4D-4CE6-90DC-FA621795A71A@thebuild.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

We've had two clients experience a crash on the secondary of a streaming replication pair, running PostgreSQL 9.3.2. In both cases, the messages were close to this example:

2013-12-30 18:08:00.464 PST,,,23869,,52ab4839.5d3d,16,,2013-12-13 09:47:37 PST,1/0,0,WARNING,01000,"page 45785 of relation base/236971/365951 is uninitialized",,,,,"xlog redo vacuum: rel 1663/236971/365951; blk 45794, lastBlockVacuumed 45784",,,,""
2013-12-30 18:08:00.465 PST,,,23869,,52ab4839.5d3d,17,,2013-12-13 09:47:37 PST,1/0,0,PANIC,XX000,"WAL contains references to invalid pages",,,,,"xlog redo vacuum: rel 1663/236971/365951; blk 45794, lastBlockVacuumed 45784",,,,""
2013-12-30 18:08:00.950 PST,,,23866,,52ab4838.5d3a,8,,2013-12-13 09:47:36 PST,,0,LOG,00000,"startup process (PID 23869) was terminated by signal 6: Aborted",,,,,,,,,""

In both cases, the indicated relation was a primary key index. In one case, rebuilding the primary key index caused the problem to go away permanently (to date). In the second case, the problem returned even after a full dump / restore of the master database (that is, after a dump / restore of the master, and reimaging the secondary, the problem returned at the same primary key index, although of course with a different OID value).

It looks like this has been experienced on 9.2.6, as well:

http://www.postgresql.org/message-id/flat/CAL_0b1s4QCkFy_55kk_8XWcJPs7wsgVWf8vn4=jXe6V4R7Hxmg(at)mail(dot)gmail(dot)com

Let me know if there's any further information I can provide.

Best,
--
-- Christophe Pettus
xof(at)thebuild(dot)com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-01-02 20:00:58 Re: ERROR: missing chunk number 0 for toast value
Previous Message Peter Geoghegan 2014-01-02 19:58:54 Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE