Re: postgres wal sender replication timeout during pg_basebackup

From: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: 'Peter Brunnengräber' <pbrunnen(at)bccglobal(dot)com>, "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: postgres wal sender replication timeout during pg_basebackup
Date: 2016-04-08 09:03:29
Message-ID: A737B7A37273E048B164557ADEF4A58B5383C01D@ntex2010i.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Peter Brunnengräber wrote:
> I brought the database files over from the current single production postgres server. By this I
> mean I shutdown postgres and tar-ed up the data directory and copied it over the the cluster's Master
> node. I put the files in place, set the permissions, and was able to start-up postgres on the Master
> via corosync just fine.
>
> In preparing the slave, I used the pg_basebackup tool to bring the database over from the Master
> and this is where I keep having issues. As it is transferring, at about 57% I see the error:
>
> > $ pg_basebackup -h db-master -U u_repl -D /db/data/postgresql/9.2/main/ -X stream -P
> > pg_basebackup: could not receive data from WAL stream: SSL connection has been closed unexpectedly
> > 176472/176472 kB (100%), 1/1 tablespace
> > pg_basebackup: child process exited with error 1`
>
> And on the server, I see:
>
> > 2016-04-06 21:05:31 UTC LOG: terminating walsender process due to replication timeout
>
> But the transfer doesn't stop and keeps going to completion.
>
> I found this [http://dba.stackexchange.com/questions/59916/streaming-replication-log-is-puzzling-me]
> question on stackexchange about setting "ssl_renegotiation_limit" to 0, but this didn't make much
> difference.
>
> Anyone have any ideas? I didn't find any reference to this problem in the mailing list archives. I
> am completely baffled as to why this would error, but keep on going. Maybe this isn't a problem at
> all? It is the same procedure I used in the lab setup... the only difference is that the production
> database is much bigger in size.

ssl_renegotiation_limit would also have been my first guess.
What PostgreSQL version are you running?

The server error message means that the client did not send a status update
within "wal_sender_timeout" milliseconds, see
http://www.postgresql.org/docs/current/static/runtime-config-replication.html#GUC-WAL-SENDER-TIMEOUT

Does pg_basebackup succeed if you set "wal_sender_timeout" to zero?

Is there a firewall between client and server that could swallow such messages?

Could you try without SSL (e.g. set the environment variable PGSSLMODE to "disable")
an see if that makes the problem go away?
Avoiding SSL will also greatly speed up pg_basebackup.

Yours,
Laurenz Albe

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Marc Mamin 2016-04-11 08:56:22 idx_scan =0, but idx_blks_read > 0
Previous Message Peter Brunnengräber 2016-04-07 18:14:07 postgres wal sender replication timeout during pg_basebackup