Quick Links

[streaming replication] 9.1.3 streaming replication bug ?

From:	乔志强 <qiaozhiqiang(at)leadcoretech(dot)com>
To:	<pgsql-general(at)postgresql(dot)org>
Subject:	[streaming replication] 9.1.3 streaming replication bug ?
Date:	2012-04-09 10:33:06
Message-ID:	E81554BCB8813E49A8916AACC0503A850B4A913E@lc-shmail3.SHANGHAI.LEADCORETECH.COM
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general pgsql-hackers

I use postgresql-9.1.3-1-windows-x64.exe on windows 2008 R2 x64.

1 master and 1 standby. The standby is a synchronous standby use streaming replication (synchronous_standby_names = '*', archive_mode = off), the master output:
standby "walreceiver" is now the synchronous standby with priority 1
the standby output:
LOG: streaming replication successfully connected to primary

Then run the test program to write and commit large blob(10 to 1000 MB bytes rand size) to master server use 40 threads(40 sessions) in loop,
The Master and standby is run on the same machine, and the client run on another machine with 100 mbps network.

But after some minutes the master output:
requested WAL segment XXX has already been removed
the standby output:
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment XXX
has already been removed

Question:
Why the master deletes the WAL segment before send to standby in synchronous mode? It is a streaming replication bug ?

I see if no standby connect to master when synchronous_standby_names = '*',
all commit will delay to standby connect to master. It is good.

Use a bigger wal_keep_segments? But I think the master should keep all WAL segments not sent to online standby (sync or async).
wal_keep_segments shoud be only for offline standby.

If use synchronous_standby_names for sync standby, if no online standby, all commit will delay to standby connect to master,
So wal_keep_segments is only for offline async standby actually.

////////////////////////////////////////

master server output:
LOG: database system was interrupted; last known up at 2012-03-30 15:37:03 HKT
LOG: database system was not properly shut down; automatic recovery in progress

LOG: redo starts at 0/136077B0
LOG: record with zero length at 0/17DF1E10
LOG: redo done at 0/17DF1D98
LOG: last completed transaction was at log time 2012-03-30 15:37:03.148+08
FATAL: the database system is starting up
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
///////////////////// the standby is a synchronous standby
LOG: standby "walreceiver" is now the synchronous standby with priority 1
/////////////////////
LOG: checkpoints are occurring too frequently (16 seconds apart)
HINT: Consider increasing the configuration parameter "checkpoint_segments".
LOG: checkpoints are occurring too frequently (23 seconds apart)
HINT: Consider increasing the configuration parameter "checkpoint_segments".
LOG: checkpoints are occurring too frequently (24 seconds apart)
HINT: Consider increasing the configuration parameter "checkpoint_segments".
LOG: checkpoints are occurring too frequently (20 seconds apart)
HINT: Consider increasing the configuration parameter "checkpoint_segments".
LOG: checkpoints are occurring too frequently (22 seconds apart)
HINT: Consider increasing the configuration parameter "checkpoint_segments".
FATAL: requested WAL segment 000000010000000000000032 has already been removed
FATAL: requested WAL segment 000000010000000000000032 has already been removed
FATAL: requested WAL segment 000000010000000000000032 has already been removed
LOG: checkpoints are occurring too frequently (8 seconds apart)
HINT: Consider increasing the configuration parameter "checkpoint_segments".
FATAL: requested WAL segment 000000010000000000000032 has already been removed

////////////////////////
standby server output:
LOG: database system was interrupted while in recovery at log time 2012-03-30 1
4:44:31 HKT
HINT: If this has occurred more than once some data might be corrupted and you
might need to choose an earlier recovery target.
LOG: entering standby mode
LOG: redo starts at 0/16E4760
LOG: consistent recovery state reached at 0/12D984D8
LOG: database system is ready to accept read only connections
LOG: record with zero length at 0/17DF1E68
LOG: invalid magic number 0000 in log file 0, segment 50, offset 6946816
LOG: streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 00
0000010000000000000032 has already been removed

In response to

Re: 9.1.3 Standby catchup mode at 2012-04-06 13:07:10 from Adrian Klaver

Responses

Re: [streaming replication] 9.1.3 streaming replication bug ? at 2012-04-09 13:32:30 from Condor
Re: [streaming replication] 9.1.3 streaming replication bug ? at 2012-04-09 13:49:25 from Adrian Klaver
Re: [streaming replication] 9.1.3 streaming replication bug ? at 2012-04-10 15:07:41 from Fujii Masao

Browse pgsql-general by date

	From	Date	Subject
Next Message	Condor	2012-04-09 13:32:30	Re: [streaming replication] 9.1.3 streaming replication bug ?
Previous Message	Jasen Betts	2012-04-09 09:03:55	Re: Regarding NOTIFY

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Ashutosh Bapat	2012-04-09 11:19:59	Potential for bugs while using COPY_POINTER_FIELD to copy NULL pointer
Previous Message	Thom Brown	2012-04-09 09:11:39	Re: pgsql_fdw, FDW for PostgreSQL server