Quick Links

Fwd: wal_retrieve_retry_interval

From:	Ludovic Vaugeois-Pepin <ludovicvp(at)gmail(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Fwd: wal_retrieve_retry_interval
Date:	2017-05-28 08:07:05
Message-ID:	CAAJDx8NVFvdgucum_pEmmBYSc-togzx_E_d2nughMeTTV9zSAw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Sun, May 28, 2017 at 6:54 AM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
wrote:

> On Sat, May 27, 2017 at 2:40 PM, Ludovic Vaugeois-Pepin
> <ludovicvp(at)gmail(dot)com> wrote:
> > Say, with 9.6.2, a hot_standby fails to connect to a replication slot:
> > FATAL: could not start WAL streaming: ERROR: replication slot
> "test3"
> > does not exist
> > or
> > FATAL: could not connect to the primary server: FATAL: the database
> > system is starting up
> >
> > Is there a way to reduce the time it takes until the next attempt? I
> > assumed, wrongly I think, that this would be wal_retrieve_retry_interval,
> > but it seems that it won't make a difference. I tried setting it to 3s,
> but
> > it seems to take 15s still. Here are two log samples:
>
> Could you double-check your configuration? If I set
> wal_retrieve_retry_interval to 1s on a standby, I am able to see a
> connection attempt from a WAL receiver happening with this interval of
> time in the case of repetitive failures.
>

It really is set at 3s on all servers (master and standbies) earlier in the
"deployment" process at the same time "listen_addresses", "hot_standby",
and others are set. This doesn't seem to happen every time I run tests. I
increased logging to DEBUG1. This is what I got when the problem occurred
again. Note that the empty line is there in the log file.

< 2017-05-28 09:29:36.127 CEST > LOG: database system was shut down in
recovery at 2017-05-28 09:29:03 CEST
< 2017-05-28 09:29:36.127 CEST > LOG: entering standby mode
< 2017-05-28 09:29:36.127 CEST > DEBUG: checkpoint record is at 0/80002B8
< 2017-05-28 09:29:36.128 CEST > DEBUG: redo record is at 0/80002B8;
shutdown TRUE
< 2017-05-28 09:29:36.128 CEST > DEBUG: next transaction ID: 0:2535; next
OID: 18660
< 2017-05-28 09:29:36.128 CEST > DEBUG: next MultiXactId: 1; next
MultiXactOffset: 0
< 2017-05-28 09:29:36.128 CEST > DEBUG: oldest unfrozen transaction ID:
1750, in database 1
< 2017-05-28 09:29:36.128 CEST > DEBUG: oldest MultiXactId: 1, in database
1
< 2017-05-28 09:29:36.128 CEST > DEBUG: commit timestamp Xid
oldest/newest: 0/0
< 2017-05-28 09:29:36.128 CEST > DEBUG: transaction ID wrap limit is
2147485397 <(214)%20748-5397>, limited by database with OID 1
< 2017-05-28 09:29:36.128 CEST > DEBUG: MultiXactId wrap limit is
2147483648 <(214)%20748-3648>, limited by database with OID 1
< 2017-05-28 09:29:36.128 CEST > DEBUG: starting up replication slots
< 2017-05-28 09:29:36.128 CEST > DEBUG: resetting unlogged relations:
cleanup 1 init 0
< 2017-05-28 09:29:36.129 CEST > DEBUG: initializing for hot standby
< 2017-05-28 09:29:36.129 CEST > DEBUG: recovery snapshots are now enabled
< 2017-05-28 09:29:36.129 CEST > LOG: consistent recovery state reached at
0/8000328
< 2017-05-28 09:29:36.129 CEST > LOG: invalid record length at 0/8000328:
wanted 24, got 0
< 2017-05-28 09:29:36.129 CEST > LOG: database system is ready to accept
read only connections
< 2017-05-28 09:29:36.135 CEST > FATAL: could not connect to the primary
server: FATAL: the database system is starting up

< 2017-05-28 09:29:36.135 CEST > DEBUG: invalid record length at
0/8000328: wanted 24, got 0
< 2017-05-28 09:29:51.153 CEST > DEBUG: invalid record length at
0/8000328: wanted 24, got 0
< 2017-05-28 09:29:51.158 CEST > LOG: fetching timeline history file for
timeline 4 from primary server
< 2017-05-28 09:29:51.160 CEST > LOG: started streaming WAL from primary
at 0/8000000 on timeline 3
< 2017-05-28 09:29:51.160 CEST > LOG: replication terminated by primary
server
< 2017-05-28 09:29:51.160 CEST > DETAIL: End of WAL reached on timeline 3
at 0/8000328.
< 2017-05-28 09:29:51.161 CEST > DEBUG: walreceiver ended streaming and
awaits new instructions
< 2017-05-28 09:29:51.161 CEST > LOG: new target timeline is 4
< 2017-05-28 09:29:51.161 CEST > DEBUG: invalid record length at
0/8000328: wanted 24, got 0
< 2017-05-28 09:29:51.161 CEST > LOG: restarted WAL streaming at 0/8000000
on timeline 4
< 2017-05-28 09:29:51.191 CEST > LOG: redo starts at 0/8000328

> --
> Michael
>

--
Ludovic Vaugeois-Pepin

In response to

Re: wal_retrieve_retry_interval at 2017-05-28 04:54:10 from Michael Paquier

Browse pgsql-general by date

	From	Date	Subject
Next Message	Neil Anderson	2017-05-28 12:49:03	Help with terminology to describe what my software does please?
Previous Message	Michael Paquier	2017-05-28 04:54:10	Re: wal_retrieve_retry_interval