Re: postgres wal sender replication timeout during pg_basebackup

From: Peter Brunnengräber <pbrunnen(at)bccglobal(dot)com>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, pgsql-admin(at)postgresql(dot)org
Subject: Re: postgres wal sender replication timeout during pg_basebackup
Date: 2016-04-11 21:18:03
Message-ID: 109968546.289.1460409481048.JavaMail.pbrunnen@Station8.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hello Mr. Albe,

> What PostgreSQL version are you running?
9.2

> The server error message means that the client did not send a status
> update within "wal_sender_timeout" milliseconds

So if I understand this correctly, the wal sender must receive a message back from the receiver in this preset time or else think that the transmission failed...

9.2 doesn't seem to have the "wal_sender_timeout" parameter, and it appears than "replication_timeout" may be the name of the parameter prior to v9.3 so this is what I am tweaking. I originally had "replication_timeout = 5s", and I verified that "wal_receiver_status_interval = 2s" per the documentation.

You were correct that setting this value to 0 did allow the pg_basebackup to complete without an error. I plan to also try setting this value to 15s to see if the pg_basebackup completes in that time frame.

> Is there a firewall between client and server that could swallow such messages?
None that I am aware of, but I will check with the Xen Hypervisor admin to make sure there isn't something setup here which could also cause trouble down the road.

> Avoiding SSL will also greatly speed up pg_basebackup.
Ok. I will give this a try as well.

Thank you ever so much for your reply and solution, it was greatly appreciated!

With kind regards. -Peter

----- Original Message -----
From: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Peter Brunnengräber" <pbrunnen(at)bccglobal(dot)com>, pgsql-admin(at)postgresql(dot)org
Sent: Friday, April 8, 2016 5:03:29 AM
Subject: Re: [ADMIN] postgres wal sender replication timeout during pg_basebackup

Peter Brunnengräber wrote:
> I brought the database files over from the current single production postgres server. By this I
> mean I shutdown postgres and tar-ed up the data directory and copied it over the the cluster's Master
> node. I put the files in place, set the permissions, and was able to start-up postgres on the Master
> via corosync just fine.
>
> In preparing the slave, I used the pg_basebackup tool to bring the database over from the Master
> and this is where I keep having issues. As it is transferring, at about 57% I see the error:
>
> > $ pg_basebackup -h db-master -U u_repl -D /db/data/postgresql/9.2/main/ -X stream -P
> > pg_basebackup: could not receive data from WAL stream: SSL connection has been closed unexpectedly
> > 176472/176472 kB (100%), 1/1 tablespace
> > pg_basebackup: child process exited with error 1`
>
> And on the server, I see:
>
> > 2016-04-06 21:05:31 UTC LOG: terminating walsender process due to replication timeout
>
> But the transfer doesn't stop and keeps going to completion.
>
> I found this [http://dba.stackexchange.com/questions/59916/streaming-replication-log-is-puzzling-me]
> question on stackexchange about setting "ssl_renegotiation_limit" to 0, but this didn't make much
> difference.
>
> Anyone have any ideas? I didn't find any reference to this problem in the mailing list archives. I
> am completely baffled as to why this would error, but keep on going. Maybe this isn't a problem at
> all? It is the same procedure I used in the lab setup... the only difference is that the production
> database is much bigger in size.

ssl_renegotiation_limit would also have been my first guess.
What PostgreSQL version are you running?

The server error message means that the client did not send a status update
within "wal_sender_timeout" milliseconds, see
http://www.postgresql.org/docs/current/static/runtime-config-replication.html#GUC-WAL-SENDER-TIMEOUT

Does pg_basebackup succeed if you set "wal_sender_timeout" to zero?

Is there a firewall between client and server that could swallow such messages?

Could you try without SSL (e.g. set the environment variable PGSSLMODE to "disable")
an see if that makes the problem go away?
Avoiding SSL will also greatly speed up pg_basebackup.

Yours,
Laurenz Albe

--
Sent via pgsql-admin mailing list (pgsql-admin(at)postgresql(dot)org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message drum.lucas@gmail.com 2016-04-11 22:34:03 Re: [TIPS] Tuning PostgreSQL 9.2
Previous Message Marc Mamin 2016-04-11 08:56:22 idx_scan =0, but idx_blks_read > 0