Re: [HACKERS] Restricting maximum keep segments by repslots

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: sawada(dot)mshk(at)gmail(dot)com
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, andres(at)anarazel(dot)de, peter(dot)eisentraut(at)2ndquadrant(dot)com
Subject: Re: [HACKERS] Restricting maximum keep segments by repslots
Date: 2018-07-23 07:16:18
Message-ID: 20180723.161618.46636100.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Hello.

At Fri, 20 Jul 2018 10:13:58 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoDayePWwu4t=VPP5P1QFDSBvks1d8j76bXp5rbXoPbZcA(at)mail(dot)gmail(dot)com>
> > As I reconsidered this, I noticed that "lsn - lsn" doesn't make
> > sense here. The correct formula for the value is
> > "max_slot_wal_keep_size * 1024 * 1024 - ((oldest LSN to keep) -
> > restart_lsn). It is not a simple formula to write by hand but
> > doesn't seem general enough. I re-changed my mind to show the
> > "distance" there again.
> >
> > pg_replication_slots now has the column "remain" instaed of
> > "min_keep_lsn", which shows an LSN when wal_status is "streaming"
> > and otherwise "0/0". In a special case, "remain" can be "0/0"
> > while "wal_status" is "streaming". It is the reason for the
> > tristate return value of IsLsnStillAvaialbe().
> >
> > wal_status | remain
> > streaming | 0/19E3C0 -- WAL is reserved
> > streaming | 0/0 -- Still reserved but on the boundary
> > keeping | 0/0 -- About to be lost.
> > lost | 0/0 -- Lost.
> >
>
> The "remain" column still shows same value at all rows as follows
> because you always compare between the current LSN and the minimum LSN
> of replication slots. Is that you expected? My comment was to show the

Ouch! Sorry for the silly mistake. GetOldestKeepSegment should
calculate restBytes based on the distance from the cutoff LSN to
restart_lsn, not to minSlotLSN. The attached fixed v6 correctly
shows the distance individually.

> Also, I'm not sure it's a good way to show the distance as LSN. LSN is
> a monotone increasing value but in your patch, a value of the "remain"
> column can get decreased. As an alternative way I'd suggest to show it

The LSN of WAL won't be decreased but an LSN is just a position
in a WAL stream. Since the representation of LSN is composed of
the two components 'file number' and 'offset', it's quite natural
to show the difference in the same unit. The distance between the
points at "6m" and "10m" is "4m".

> as the number of segments. Attached patch is a patch for your v5 patch
> that changes it so that the column shows how many WAL segments of
> individual slots are remained until they get lost WAL.

Segment size varies by configuration, so segment number is not
intuitive to show distance. I think it is the most significant
reason we move to "bytes" from "segments" about WAL sizings like
max_wal_size. More than anything, it's too coarse. The required
segments may be lasts for the time to consume a whole segment or
may be removed just after. We could calculate the fragment bytes
but it requires some internal knowledge.

Instead, I made the field be shown in flat "bytes" using bigint,
which can be nicely shown using pg_size_pretty;

=# select pg_current_wal_lsn(), restart_lsn, wal_status, pg_size_pretty(remain) as remain from pg_replication_slots ;
pg_current_wal_lsn | restart_lsn | wal_status | remain
--------------------+-------------+------------+--------
0/DD3B188 | 0/CADD618 | streaming | 19 MB
0/DD3B188 | 0/DD3B188 | streaming | 35 MB

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v6-0001-Add-WAL-releaf-vent-for-replication-slots.patch text/x-patch 6.5 KB
v6-0002-Add-monitoring-aid-for-max_slot_wal_keep_size.patch text/x-patch 12.2 KB
v6-0003-TAP-test-for-the-slot-limit-feature.patch text/x-patch 5.3 KB
v6-0004-Documentation-for-slot-limit-feature.patch text/x-patch 5.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-07-23 07:21:30 Re: [bug fix] Produce a crash dump before main() on Windows
Previous Message Michael Paquier 2018-07-23 07:06:44 Re: Memory leak with CALL to Procedure with COMMIT.