Re: Restricting maximum keep segments by repslots

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Restricting maximum keep segments by repslots
Date: 2017-02-28 03:42:32
Message-ID: CAB7nPqQm0QetoShggQnn4bLFd9oXKKHG7RafBP3Krno62=ORww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 28, 2017 at 12:27 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Although replication slot is helpful to avoid unwanted WAL
> deletion, on the other hand it can cause a disastrous situation
> by keeping WAL segments without a limit. Removing the causal
> repslot will save this situation but it is not doable if the
> standby is active. We should do a rather complex and forcible
> steps to relieve the situation especially in an automatic
> manner. (As for me, specifically in an HA cluster.)
>
> This patch adds a GUC to put a limit to the number of segments
> that replication slots can keep. Hitting the limit during
> checkpoint shows a warining and the segments older than the limit
> are removed.
>
>> WARNING: restart LSN of replication slots is ignored by checkpoint
>> DETAIL: Some replication slots lose required WAL segnents to continue.
>
> Another measure would be automatic deletion or inactivation of
> the culprit slot but it seems too complex for the problem.
>
>
> As we have already postponed some patches by the triage for the
> last commit fest, this might should be postponed to PG11.

Please no. Replication slots are designed the current way because we
don't want to have to use something like wal_keep_segments as it is a
wart, and this applies as well for replication slots in my opinion. If
a slot is bloating WAL and you care about your Postgres instance, I
would recommend instead that you use a background worker that does
monitoring of the situation based on max_wal_size for example, killing
the WAL sender associated to the slot if there is something connected
but it is frozen or it cannot keep up the pace of WAL generation, and
then dropping the slot. You may want to issue a checkpoint in this
case as well to ensure that segments get recycled. But anyway, if you
reach this point of WAL bloat, perhaps that's for the best as users
would know about it because backups would get in danger. For some
applications, that is acceptable, but you could always rely on
monitoring slots and kill them on sight if needed. That's as well more
flexible than having a parameter that basically is just a synonym of
max_wal_size.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2017-02-28 03:50:40 Re: logical replication access control patches
Previous Message Michael Paquier 2017-02-28 03:29:56 Re: Partitioned tables and relfilenode