BUG #6094: Streaming replication does not catch up when writing enough data

From: "David Hartveld" <david(dot)hartveld(at)mendix(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #6094: Streaming replication does not catch up when writing enough data
Date: 2011-07-07 12:05:59
Message-ID: 201107071205.p67C5x79050836@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 6094
Logged by: David Hartveld
Email address: david(dot)hartveld(at)mendix(dot)com
PostgreSQL version: 9.1-beta2
Operating system: Debian GNU/Linux 6.0.2 "Squeeze"
Description: Streaming replication does not catch up when writing
enough data
Details:

After creation of two new clusters, and setting them up as master and slave
(in async mode, according to the current 9.1 docs), the execution of a large
SQL script (creating a db, tables, sequences, etc., filling them with data
through COPY) runs properly on the master, but does not stream to the slave,
i.e. the slave does not catch up. In the master log, the following line is
printed many times:

2011-07-07 13:48:27 CEST LOG: could not send data to client: Connection
reset by peer

In the slave log, the following corresponding lines are printed, for each
log line on the master:

2011-07-07 13:48:27 CEST LOG: streaming replication successfully connected
to primary
2011-07-07 13:48:27 CEST LOG: record with zero length at 0/51E0010
2011-07-07 13:48:27 CEST FATAL: terminating walreceiver process due to
administrator command
cp: cannot stat `/walshipping/9.1/test/000000010000000000000005': No such
file or directory
2011-07-07 13:48:27 CEST LOG: record with zero length at 0/51E4010
cp: cannot stat `/walshipping/9.1/test/000000010000000000000005': No such
file or directory

The 'record with zero length' line is printed many times.

I have configured the clusters with the following 'script':

EDITOR=/usr/bin/vim

MASTER=pg-db-01
SLAVE=pg-db-02
PORT=3000
VERSION=9.1
CLUSTERNAME=test

BOTH
- Create 9.1 cluster on port 3000
# pg_createcluster -p $PORT $VERSION $CLUSTERNAME
- Add line 'host all all samenet trust' to pg_hba.conf.
# $EDITOR /etc/postgresql/$VERSION/$CLUSTERNAME/pg_hba.conf
- Listen on all IPs: Change 'listen_addresses' to '*' in postgresql.conf.
# $EDITOR /etc/postgresql/$VERSION/$CLUSTERNAME/postgresql.conf

MASTER
- Enable wal archiving. Set the following configuration parameters in
postgresql.conf
(and create directory /walshipping/9.1/test, owned by postgres):
wal_level = hot_standby
archive_mode = on
archive_command = 'cp -i %p /walshipping/9.1/test/%f < /dev/null'
# $EDITOR /etc/postgresql/$VERSION/$CLUSTERNAME/postgresql.conf
- To enable streaming replication, set the following configuration
parameters in postgresql.conf:
wal_keep_segments = 64 # * 16 MiB, 1 GiB disk space needed.
max_wal_senders = 1 # Or some other number at least equal to the number
of standby servers.
# $EDITOR /etc/postgresql/$VERSION/$CLUSTERNAME/postgresql.conf
- Also add line 'host replication postgres samenet trust' to pg_hba.conf
# $EDITOR /etc/postgresql/$VERSION/$CLUSTERNAME/pg_hba.conf
- Start the cluster.
# pg_ctlcluster $VERSION $CLUSTERNAME start
- Create a base backup for the slave.
# psql -U postgres -h localhost -p $PORT \
-c "SELECT pg_start_backup('base', true)"
# rsync -a /var/lib/postgresql/$VERSION/$CLUSTERNAME/*
/pgbackup/$VERSION/$CLUSTERNAME/
# psql -U postgres -h localhost -p $PORT \
-c "SELECT pg_stop_backup()"
# rm -rf /pgbackup/$VERSION/$CLUSTERNAME/{postmaster.pid,pg_xlog/*}
# cd /pgbackup/$VERSION
# tar jcvf $CLUSTERNAME.tar.bz2 ./$CLUSTERNAME/


SLAVE
- 'Restore' the created backup from the master.
# cd /var/lib/postgresql/$VERSION
# rm -rf $CLUSTERNAME.orig
# mv -f $CLUSTERNAME $CLUSTERNAME.orig
# tar jxvf /$CLUSTERNAME.tar.bz2
- Create recovery.conf with the following configuration parameters:
standby_mode = 'on'
primary_conninfo = 'host=$MASTER port=$PORT user=postgres'
restore_command = 'cp /walshipping/$VERSION/$CLUSTERNAME/%f %p'
# $EDITOR /var/lib/postgresql/$VERSION/$CLUSTERNAME/recovery.conf
- Start the cluster.
# chown -R postgres.postgres $CLUSTERNAME
# chmod 0700 $CLUSTERNAME
# pg_ctlcluster $VERSION $CLUSTERNAME start

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Simon Riggs 2011-07-07 12:24:41 Re: BUG #6094: Streaming replication does not catch up when writing enough data
Previous Message Jaime Casanova 2011-07-07 06:06:54 Re: BUG #6093: timeout