Re: BUG #13822: Slave terminated - WAL contains references to invalid page

From: <Marek(dot)Petr(at)tieto(dot)com>
To: <michael(dot)paquier(at)gmail(dot)com>
Cc: <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #13822: Slave terminated - WAL contains references to invalid page
Date: 2015-12-22 12:05:50
Message-ID: 5f921d78aaa04807b4c809e0ff130b11@C105S135VM024.eu.tieto.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello,

Yes, that relation you're asking has currently 39GB. Slave rebuild helped but in other environment it occured again even after the rebuild.
Following two occurences are from different PostgreSQL master/slave system (using streaming replication) running at different machines:

2015-12-15 13:05:39 CET @ WARNING: page 4333275 of relation base/16422/17230 is uninitialized
2015-12-15 13:05:39 CET @ CONTEXT: xlog redo visible: rel 1663/16422/17230; blk 4333275
2015-12-15 13:05:39 CET @ PANIC: WAL contains references to invalid pages
2015-12-15 13:05:39 CET @ CONTEXT: xlog redo visible: rel 1663/16422/17230; blk 4333275
2015-12-15 13:05:39 CET @ LOG: startup process (PID 8963) was terminated by signal 6: Aborted
2015-12-15 13:05:39 CET @ LOG: terminating any other active server processes

2015-12-22 00:25:11 CET @ WARNING: page 71566 of relation base/16422/23253 is uninitialized
2015-12-22 00:25:11 CET @ CONTEXT: xlog redo visible: rel 1663/16422/23253; blk 71566
2015-12-22 00:25:11 CET @ PANIC: WAL contains references to invalid pages
2015-12-22 00:25:11 CET @ CONTEXT: xlog redo visible: rel 1663/16422/23253; blk 71566
2015-12-22 00:25:12 CET @ LOG: startup process (PID 24434) was terminated by signal 6: Aborted
2015-12-22 00:25:12 CET @ LOG: terminating any other active server processes

select relname from pg_class where relfilenode in ('17230','23253');
relname
----------------
pg_toast_17225
pg_toast_23246
(2 rows)

First toast's relation has 34GB, second 2452 MB.

Is it possible to get more info from some deeper logging for the case it will occur again?

Regards
Marek

-----Original Message-----
From: Michael Paquier [mailto:michael(dot)paquier(at)gmail(dot)com]
Sent: Friday, December 18, 2015 6:05 AM
To: Petr Marek <Marek(dot)Petr(at)tieto(dot)com>
Cc: PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #13822: Slave terminated - WAL contains references to invalid page

On Thu, Dec 17, 2015 at 9:50 PM, <marek(dot)petr(at)tieto(dot)com> wrote:
> Several days after one in-place and one out of place upgrade from 9.3
> to 9.4 version following event occured on both environments:
>
> 2015-12-15 22:35:18 CET @ WARNING: page 4119662 of relation
> base/16422/18134 is uninitialized
> 2015-12-15 22:35:18 CET @ CONTEXT: xlog redo visible: rel
> 1663/16422/18134; blk 4119662
> 2015-12-15 22:35:18 CET @ PANIC: WAL contains references to invalid
> pages
> 2015-12-15 22:35:18 CET @ CONTEXT: xlog redo visible: rel
> 1663/16422/18134; blk 4119662
> 2015-12-15 22:35:18 CET @ LOG: startup process (PID 22269) was
> terminated by signal 6: Aborted
> 2015-12-15 22:35:18 CET @ LOG: terminating any other active server
> process
>
> Once it was TOAST and another regular table.

This is the indication of some data corruption, page 4119662 referring to at least a size of 31GB, but this so less information it is hard to guess what could happen. Is 31GB more or less the size of this relation? If you deploy a slave from a fresh base backup, do you still see the error? That's unlikely so if it is the second time you are seeing this problem, but it may be a problem of corruption within the WAL segments themselves.
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message henrik.pauli 2015-12-22 14:02:13 BUG #13829: Exponentiation operator is left-associative
Previous Message Jeff Janes 2015-12-22 03:45:21 Re: [BUGS] GIN index isn’t working with intarray