Re: [GENERAL] pg_xlog on a hot_standby slave filling up

From: Lacey Powers <lacey(dot)leanne(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: [GENERAL] pg_xlog on a hot_standby slave filling up
Date: 2015-06-16 22:12:36
Message-ID: 55809F54.5000402@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-general

On 06/16/2015 01:16 PM, Jeff Frost wrote:
>> On Jun 16, 2015, at 11:35 AM, Christoph Berg <cb(at)df7cb(dot)de> wrote:
>>
>> [moving to -bugs]
>>
>> Re: Xavier 12 2015-06-16 <CAMOV8iB3oRzC4f7UTzOwC2wT08do3voi+PGN07uJq+ayo9E=cQ(at)mail(dot)gmail(dot)com>
>>> Hi everyone,
>>>
>>> Questions about pg_xlogs again...
>>> I have two Postgresql 9.1 servers in a master/slave stream replication
>>> (hot_standby).
>>>
>>> Psql01 (master) is backuped with Barman and pg_xlogs is correctly
>>> purged (archive_command is used).
>>>
>>> Hower, Psql02 (slave) has a huge pg_xlog (951 files, 15G for 7 days
>>> only, it keeps growing up until disk space is full). I have found
>>> documentation and tutorials, mailing list, but I don't know what is
>>> suitable for a Slave. Leads I've found :
>> Hi,
>>
>> I have the same problem here. Master/slave running on 9.3.current. On
>> the master everything is normal, but on the slave server, files in
>> pg_xlog and archive_status pile up. Interestingly, the filenames are
>> mostly 0x20 apart. (IRC user Kassandry is reporting the same issue on
>> 9.4 as well, including the 0x20 spacing.)
> I’ve seen this before, but haven’t been able to make a reproducible test case yet.
>
> Are you by chance using SSL to talk to the primary server? Is the ssl_renegotiation_limit the default of 512MB? 32 WAL files at 16MB each = 512MB. I found that it would always leave the WAL file from before the invalid record length message. Does that seem to be the case for you as well?
>
>
>

Hello Jeff,

To add to this on PostgreSQL 9.4 ( Kassandry from IRC ), yes, I see SSL
errors in my logs.

I turned off the archive_command I had running on one of my three
replicas, which recycled all of the .ready files and all of the
outstanding xlogs.

I re-enabled the archive_command and waited.

I got this in my logs:

< @[] LOG: restartpoint complete: wrote 15437 buffers (2.9%); 0
transaction log file(s) added, 0 removed, 5 recycled; write=269.358 s,
sync=0.035 s, total=269.397 s; sync files=202, longest=0.008 s,
average=0.000 s
< @[] LOG: recovery restart point at 650/4D01CCA0
< @[] DETAIL: last completed transaction was at log time 2015-06-16
21:41:41.990409+00
< @[] LOG: restartpoint starting: time
< @[] LOG: restartpoint complete: wrote 115 buffers (0.0%); 0
transaction log file(s) added, 0 removed, 12 recycled; write=11.446 s,
sync=0.005 s, total=11.455 s; sync files=29, longest=0.001 s,
average=0.000 s
< @[] LOG: recovery restart point at 650/5204B6C8
< @[] DETAIL: last completed transaction was at log time 2015-06-16
21:42:24.524081+00
< @[] FATAL: could not send data to WAL stream: SSL error: unexpected
record
< @[] LOG: unexpected pageaddr 650/18000000 in log segment
00000001000006500000005A, offset 0

And a ready file appeared and stayed for 000000010000065000000059 :

-rw------- 1 postgres postgres 0 Jun 16 21:57 000000010000065000000059.ready

On my other streaming replica, there are lots of these log messages, and
it looks like there is also a ready file for each of the segments
previous to the segment mentioned in the unexpected pageaddr message.

Hope this helps.

Please let me know if I can gather further data to help fix this. =)

Regards,

Lacey

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Jose Alberto Lopez 2015-06-16 23:28:25 Statics collector
Previous Message Tom Lane 2015-06-16 22:08:20 Re: BUG #13444: psql can't recover a pg_dump.

Browse pgsql-general by date

  From Date Subject
Next Message Jorge Torralba 2015-06-16 23:49:27 Issues trying to run bdr_init_copy with new setup
Previous Message Thomas Munro 2015-06-16 21:41:32 Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1