Re: testing HS/SR - invalid magic number

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Erik Rijkers <er(at)xs4all(dot)nl>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: testing HS/SR - invalid magic number
Date: 2010-04-14 06:23:42
Message-ID: 4BC55F6E.4000708@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Erik Rijkers wrote:
> This replication test that was working well earlier (it ran daily), stopped working
> after reinstall of new instances of cvs HEAD. I think the change must have been today (or at least
> recent).
> ...
> -- logfile standby:
> ...
> 2010-04-14 02:21:11 CEST 5601 start=2010-04-14 02:18:22 CEST FATAL: could not receive data from
> WAL stream: FATAL: requested WAL segment 000000010000000000000032 has already been removed
>
> cp: cannot stat `/var/data1/pg_stuff/dump/hotslave/replication_archive/000000010000000000000032':
> No such file or directory
> 2010-04-14 02:21:11 CEST 5598 start=2010-04-14 02:18:22 CEST LOG: invalid magic number 0000 in
> log file 0, segment 50, offset 13795328
> cp: cannot stat `/var/data1/pg_stuff/dump/hotslave/replication_archive/000000010000000000000032':
> No such file or directory
> 2010-04-14 02:21:11 CEST 5784 start=2010-04-14 02:21:11 CEST LOG: streaming replication
> successfully connected to primary

This is probably because of this change:

> date: 2010/04/12 09:52:29; author: heikki; state: Exp; lines: +71 -23
> Change the logic to decide when to delete old WAL segments, so that it
> doesn't take into account how far the WAL senders are. This way a hung
> WAL sender doesn't prevent old WAL segments from being recycled/removed
> in the primary, ultimately causing the disk to fill up. Instead add
> standby_keep_segments setting to control how many old WAL segments are
> kept in the primary. This also makes it more reliable to use streaming
> replication without WAL archiving, assuming that you set
> standby_keep_segments high enough.

If you generate enough WAL records in the master that the standby can't
keep up, the primary will eventually delete a WAL segment that hasn't
been streamed to the standby yet, hence the "requested WAL segment
000000010000000000000032 has already been removed" error. It shouldn't
remove the segment before it's successfully archived, though, and your
logs show that the standby can't find that file in the archive either.
Is archiving set up properly?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2010-04-14 08:02:18 Re: walreceiver is uninterruptible on win32
Previous Message Heikki Linnakangas 2010-04-14 06:18:21 Re: testing HS/SR - invalid magic number