Re: Restricting maximum keep segments by repslots

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: peter(dot)eisentraut(at)2ndquadrant(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Restricting maximum keep segments by repslots
Date: 2017-09-07 12:59:56
Message-ID: 20170907.215956.110216588.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

At Thu, 07 Sep 2017 14:12:12 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20170907(dot)141212(dot)227032666(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> > I would like a flag in pg_replication_slots, and possibly also a
> > numerical column that indicates how far away from the critical point
> > each slot is. That would be great for a monitoring system.
>
> Great! I'll do that right now.

Done.

In the attached patch on top of the previous patch, I added two
columns in pg_replication_slots, "live" and "distance". The first
indicates the slot will "live" after the next checkpoint. The
second shows the how many bytes checkpoint lsn can advance before
the slot will "die", or how many bytes the slot have lost after
"death".

Setting wal_keep_segments = 1 and max_slot_wal_keep_size = 16MB.

=# select slot_name, restart_lsn, pg_current_wal_lsn(), live, distance from pg_replication_slots;

slot_name | restart_lsn | pg_current_wal_lsn | live | distance
-----------+-------------+--------------------+------+-----------
s1 | 0/162D388 | 0/162D3C0 | t | 0/29D2CE8

This shows that checkpoint can advance 0x29d2ce8 bytes before the
slot will die even if the connection stalls.

s1 | 0/4001180 | 0/6FFF2B8 | t | 0/DB8

Just before the slot loses sync.

s1 | 0/4001180 | 0/70008A8 | f | 0/FFEE80

The checkpoint after this removes some required segments.

2017-09-07 19:04:07.677 JST [13720] WARNING: restart LSN of replication slots is ignored by checkpoint
2017-09-07 19:04:07.677 JST [13720] DETAIL: Some replication slots have lost required WAL segnents to continue by up to 1 segments.

If max_slot_wal_keep_size if not set (0), live is always true and
distance is NULL.

slot_name | restart_lsn | pg_current_wal_lsn | live | distance
-----------+-------------+--------------------+------+-----------
s1 | 0/4001180 | 0/73117A8 | t |

- The name (or its content) of the new columns should be arguable.

- pg_replication_slots view takes LWLock on ControlFile and
spinlock on XLogCtl for every slot. But seems difficult to
reduce it..

- distance seems mitakenly becomes 0/0 for certain condition..

- The result seems almost right but more precise check needed.
(Anyway it cannot be perfectly exact.);

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
0002-Add-monitoring-aid-for-max_replication_slots.patch text/x-patch 7.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexey Chernyshov 2017-09-07 13:18:37 Re: index-only count(*) for indexes supporting bitmap scans
Previous Message Ashutosh Bapat 2017-09-07 12:57:34 Re: Adding support for Default partition in partitioning