Re: Standby is not removing restored WAL segments

From: Eduardo Morras <emorrasg(at)yahoo(dot)es>
To: pgsql-admin(at)postgresql(dot)org
Subject: Re: Standby is not removing restored WAL segments
Date: 2014-09-09 08:45:24
Message-ID: 20140909104524.9c158f39a3edff9166a376d0@yahoo.es
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Fri, 5 Sep 2014 09:33:57 +0200
Alexey Klyukin <alexk(at)hintbits(dot)com> wrote:

> Greetings,
>
> We've got a 9.3.5 DB running in a standby mode for a fairly large DB
> (500GB) with a busy WAL traffic (couple of GBs per hour) and it
> occasionally 'forgets' to remove the segments it restored.
>
> The checkpoint_segments is set to 128, and usually we observe around
> 270 segments accumulated, but at the time it happens our check
> triggers at around 2K segments. The manual checkpoint command takes
> ages to complete there, the fast shutdown is very slow (around 10
> minutes, usually less than 1 minute) and the WAL receiver process is
> also unable to run for some reason.
>
> The only way to make this host delete WAL files is to restart . The
> particularly notable restart point right after the shutdown shows
> quite a number of removed files and buffers written (the shared
> buffers is set to 8GB on this system):
>
> 2014-09-04 14:39:33.376 CEST,,,22354,,537a4553.5752,88217,,2014-05-19
> 19:54:27 CEST,,0,LOG,00000,"restartpoint complete: wrote 332473
> buffers (31.7%); 0 transaction log file(s) added, 1237 removed, 6
> recycled; write=9.745 s, sync=680.314 s, total=694.447 s; sync
> files=499
> , longest=37.774 s, average=1.363 s",,,,,,,,,""
>
> If we leave the host running, this restartpoint never happens.
>
> The only difference I can come up with from the other databases that
> do not show this behavior is that the host is running with
> max_standby_streaming_delay and max_standby_archive_delay set to -1,
> but at the time we observed the problem no queries were running on it
> at all.
>
> The problem occurs rarely, but steadily, around once every 3 months.
> During this time the PostgreSQL has been upgraded from 9.0 to 9.3,
> which did not solve the issue.
>

Perhaps, the delete of wal files occurs before, in filesystem time, the wal file is closed by filesystem, and delete returns "error file still open".

> Any clues on how can we debug and diagnose the problem further to come
> up with a proper bug report, if it is a bug, or are we missing
> something in the configuration that causes this?
>
>
> Regards,
> --
> Alexey Klyukin
>
>
> --
> Sent via pgsql-admin mailing list (pgsql-admin(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin

--- ---
Eduardo Morras <emorrasg(at)yahoo(dot)es>

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Alexey Klyukin 2014-09-09 14:51:16 Re: Standby is not removing restored WAL segments
Previous Message Jerry Sievers 2014-09-08 17:06:45 Re: Standby is not removing restored WAL segments