From: | Ludovic Vaugeois-Pepin <ludovicvp(at)gmail(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Fwd: wal_retrieve_retry_interval |
Date: | 2017-05-28 08:07:05 |
Message-ID: | CAAJDx8NVFvdgucum_pEmmBYSc-togzx_E_d2nughMeTTV9zSAw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Sun, May 28, 2017 at 6:54 AM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
wrote:
> On Sat, May 27, 2017 at 2:40 PM, Ludovic Vaugeois-Pepin
> <ludovicvp(at)gmail(dot)com> wrote:
> > Say, with 9.6.2, a hot_standby fails to connect to a replication slot:
> > FATAL: could not start WAL streaming: ERROR: replication slot
> "test3"
> > does not exist
> > or
> > FATAL: could not connect to the primary server: FATAL: the database
> > system is starting up
> >
> > Is there a way to reduce the time it takes until the next attempt? I
> > assumed, wrongly I think, that this would be wal_retrieve_retry_interval,
> > but it seems that it won't make a difference. I tried setting it to 3s,
> but
> > it seems to take 15s still. Here are two log samples:
>
> Could you double-check your configuration? If I set
> wal_retrieve_retry_interval to 1s on a standby, I am able to see a
> connection attempt from a WAL receiver happening with this interval of
> time in the case of repetitive failures.
>
It really is set at 3s on all servers (master and standbies) earlier in the
"deployment" process at the same time "listen_addresses", "hot_standby",
and others are set. This doesn't seem to happen every time I run tests. I
increased logging to DEBUG1. This is what I got when the problem occurred
again. Note that the empty line is there in the log file.
< 2017-05-28 09:29:36.127 CEST > LOG: database system was shut down in
recovery at 2017-05-28 09:29:03 CEST
< 2017-05-28 09:29:36.127 CEST > LOG: entering standby mode
< 2017-05-28 09:29:36.127 CEST > DEBUG: checkpoint record is at 0/80002B8
< 2017-05-28 09:29:36.128 CEST > DEBUG: redo record is at 0/80002B8;
shutdown TRUE
< 2017-05-28 09:29:36.128 CEST > DEBUG: next transaction ID: 0:2535; next
OID: 18660
< 2017-05-28 09:29:36.128 CEST > DEBUG: next MultiXactId: 1; next
MultiXactOffset: 0
< 2017-05-28 09:29:36.128 CEST > DEBUG: oldest unfrozen transaction ID:
1750, in database 1
< 2017-05-28 09:29:36.128 CEST > DEBUG: oldest MultiXactId: 1, in database
1
< 2017-05-28 09:29:36.128 CEST > DEBUG: commit timestamp Xid
oldest/newest: 0/0
< 2017-05-28 09:29:36.128 CEST > DEBUG: transaction ID wrap limit is
2147485397 <(214)%20748-5397>, limited by database with OID 1
< 2017-05-28 09:29:36.128 CEST > DEBUG: MultiXactId wrap limit is
2147483648 <(214)%20748-3648>, limited by database with OID 1
< 2017-05-28 09:29:36.128 CEST > DEBUG: starting up replication slots
< 2017-05-28 09:29:36.128 CEST > DEBUG: resetting unlogged relations:
cleanup 1 init 0
< 2017-05-28 09:29:36.129 CEST > DEBUG: initializing for hot standby
< 2017-05-28 09:29:36.129 CEST > DEBUG: recovery snapshots are now enabled
< 2017-05-28 09:29:36.129 CEST > LOG: consistent recovery state reached at
0/8000328
< 2017-05-28 09:29:36.129 CEST > LOG: invalid record length at 0/8000328:
wanted 24, got 0
< 2017-05-28 09:29:36.129 CEST > LOG: database system is ready to accept
read only connections
< 2017-05-28 09:29:36.135 CEST > FATAL: could not connect to the primary
server: FATAL: the database system is starting up
< 2017-05-28 09:29:36.135 CEST > DEBUG: invalid record length at
0/8000328: wanted 24, got 0
< 2017-05-28 09:29:51.153 CEST > DEBUG: invalid record length at
0/8000328: wanted 24, got 0
< 2017-05-28 09:29:51.158 CEST > LOG: fetching timeline history file for
timeline 4 from primary server
< 2017-05-28 09:29:51.160 CEST > LOG: started streaming WAL from primary
at 0/8000000 on timeline 3
< 2017-05-28 09:29:51.160 CEST > LOG: replication terminated by primary
server
< 2017-05-28 09:29:51.160 CEST > DETAIL: End of WAL reached on timeline 3
at 0/8000328.
< 2017-05-28 09:29:51.161 CEST > DEBUG: walreceiver ended streaming and
awaits new instructions
< 2017-05-28 09:29:51.161 CEST > LOG: new target timeline is 4
< 2017-05-28 09:29:51.161 CEST > DEBUG: invalid record length at
0/8000328: wanted 24, got 0
< 2017-05-28 09:29:51.161 CEST > LOG: restarted WAL streaming at 0/8000000
on timeline 4
< 2017-05-28 09:29:51.191 CEST > LOG: redo starts at 0/8000328
> --
> Michael
>
--
Ludovic Vaugeois-Pepin
--
Ludovic Vaugeois-Pepin
From | Date | Subject | |
---|---|---|---|
Next Message | Neil Anderson | 2017-05-28 12:49:03 | Help with terminology to describe what my software does please? |
Previous Message | Michael Paquier | 2017-05-28 04:54:10 | Re: wal_retrieve_retry_interval |