Re: Streaming replication bug in 9.3.2, "WAL contains references to invalid pages"

From: Omar Kilani <omar(dot)kilani(at)gmail(dot)com>
To: Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>
Cc: Christophe Pettus <xof(at)thebuild(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication bug in 9.3.2, "WAL contains references to invalid pages"
Date: 2014-01-03 21:40:56
Message-ID: CA+8F9hgxkzdQ4Hni-+FWPYUGT8q2q9au01MK3_=DVQ+m2Md5_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

We had the same issues running 9.2.4:

[2013-10-15 00:23:01 GMT/0/15396] WARNING: page 8789807 of relation
base/16429/2349631976 is uninitialized
[2013-10-15 00:23:01 GMT/0/15396] CONTEXT: xlog redo vacuum: rel
1663/16429/2349631976; blk 8858544, lastBlockVacuumed 0
[2013-10-15 00:23:01 GMT/0/15396] PANIC: WAL contains references to
invalid pages
[2013-10-15 00:23:01 GMT/0/15396] CONTEXT: xlog redo vacuum: rel
1663/16429/2349631976; blk 8858544, lastBlockVacuumed 0
[2013-10-15 00:23:11 GMT/0/15393] LOG: startup process (PID 15396)
was terminated by signal 6: Aborted
[2013-10-15 00:23:11 GMT/0/15393] LOG: terminating any other active
server processes

Also on an index. I ended up manually patching the heap files at that
block location to "fix" the problem. It happened again about 2 weeks
after that, then never again. It hit all connected secondaries.

On Fri, Jan 3, 2014 at 12:50 PM, Sergey Konoplev <gray(dot)ru(at)gmail(dot)com> wrote:
> On Thu, Jan 2, 2014 at 11:59 AM, Christophe Pettus <xof(at)thebuild(dot)com> wrote:
>> In both cases, the indicated relation was a primary key index. In one case, rebuilding the primary key index caused the problem to go away permanently (to date). In the second case, the problem returned even after a full dump / restore of the master database (that is, after a dump / restore of the master, and reimaging the secondary, the problem returned at the same primary key index, although of course with a different OID value).
>>
>> It looks like this has been experienced on 9.2.6, as well:
>>
>> http://www.postgresql.org/message-id/flat/CAL_0b1s4QCkFy_55kk_8XWcJPs7wsgVWf8vn4=jXe6V4R7Hxmg(at)mail(dot)gmail(dot)com
>
> This problem worries me a lot too. If someone is interested I still
> have a file system copy of the buggy cluster including WAL.
>
> --
> Kind regards,
> Sergey Konoplev
> PostgreSQL Consultant and DBA
>
> http://www.linkedin.com/in/grayhemp
> +1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
> gray(dot)ru(at)gmail(dot)com
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2014-01-03 22:18:50 costing of hash join
Previous Message Sergey Konoplev 2014-01-03 20:50:14 Re: Streaming replication bug in 9.3.2, "WAL contains references to invalid pages"