postgres wal sender replication timeout during pg_basebackup

From: Peter Brunnengräber <pbrunnen(at)bccglobal(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: postgres wal sender replication timeout during pg_basebackup
Date: 2016-04-07 18:14:07
Message-ID: 2112705921.1067.1460052846289.JavaMail.pbrunnen@Station8.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hello all,
I had posted this to dba.stackexchange but haven't gotten any responses, so I thought the list here may be more focused and have a better shot to post this.

I'll start by noting that I am still somewhat green with Postgres... One of our applications requires it, so I have been learning as I go...

Right now I am working on a postgres 9.2 Active/Standby cluster on Debian wheezy to make the application more fault tolerent, based off of the ClusterLabs pgsql cluster documentation [http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster].

In the lab, I am able to get this setup and working without a problem; But on the pre-production cluster, I keep running into a wal sync error.

I brought the database files over from the current single production postgres server. By this I mean I shutdown postgres and tar-ed up the data directory and copied it over the the cluster's Master node. I put the files in place, set the permissions, and was able to start-up postgres on the Master via corosync just fine.

In preparing the slave, I used the pg_basebackup tool to bring the database over from the Master and this is where I keep having issues. As it is transferring, at about 57% I see the error:

> $ pg_basebackup -h db-master -U u_repl -D /db/data/postgresql/9.2/main/ -X stream -P
> pg_basebackup: could not receive data from WAL stream: SSL connection has been closed unexpectedly
> 176472/176472 kB (100%), 1/1 tablespace
> pg_basebackup: child process exited with error 1`

And on the server, I see:

> 2016-04-06 21:05:31 UTC LOG: terminating walsender process due to replication timeout

But the transfer doesn't stop and keeps going to completion.

I found this [http://dba.stackexchange.com/questions/59916/streaming-replication-log-is-puzzling-me] question on stackexchange about setting "ssl_renegotiation_limit" to 0, but this didn't make much difference.

Anyone have any ideas? I didn't find any reference to this problem in the mailing list archives. I am completely baffled as to why this would error, but keep on going. Maybe this isn't a problem at all? It is the same procedure I used in the lab setup... the only difference is that the production database is much bigger in size.

Any thoughts??

-With kind regards, Peter.

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Albe Laurenz 2016-04-08 09:03:29 Re: postgres wal sender replication timeout during pg_basebackup
Previous Message Dave Johansen 2016-04-05 18:05:56 Disk reads when using streaming replication?