Re: [HACKERS] Restricting maximum keep segments by repslots

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: sawada(dot)mshk(at)gmail(dot)com
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, andres(at)anarazel(dot)de, peter(dot)eisentraut(at)2ndquadrant(dot)com
Subject: Re: [HACKERS] Restricting maximum keep segments by repslots
Date: 2018-07-13 08:40:04
Message-ID: 20180713.174004.249224160.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.

At Wed, 11 Jul 2018 15:09:23 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoCFtW6+SN_eVTszDAjQeeU2sSea2VpCEx08ejNafk8H9w(at)mail(dot)gmail(dot)com>
> On Mon, Jul 9, 2018 at 2:47 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
..
> Here is review comments of v4 patches.
>
> + if (minKeepLSN)
> + {
> + XLogRecPtr slotPtr = XLogGetReplicationSlotMinimumLSN();
> + Assert(!XLogRecPtrIsInvalid(slotPtr));
> +
> + tailSeg = GetOldestKeepSegment(currpos, slotPtr);
> +
> + XLogSegNoOffsetToRecPtr(tailSeg, 0, *minKeepLSN,
> wal_segment_size);
> + }
>
> The usage of XLogSegNoOffsetToRecPtr is wrong. Since we specify the
> destination at 4th argument the wal_segment_size will be changed in
> the above expression. The regression tests by PostgreSQL Patch Tester

I'm not sure I get this correctly, the definition of the macro is
as follows.

| #define XLogSegNoOffsetToRecPtr(segno, offset, dest, wal_segsz_bytes) \
| (dest) = (segno) * (wal_segsz_bytes) + (offset)

The destination is the *3rd* parameter and the forth is segment
size which is not to be written.

> seem passed but I got the following assertion failure in
> recovery/t/010_logical_decoding_timelines.pl
>
> TRAP: FailedAssertion("!(XLogRecPtrToBytePos(*StartPos) ==
> startbytepos)", File: "xlog.c", Line: 1277)

Hmm. I don't see a relation with this patch, but how did you
cause the failure? The failure means inconsistency between
existing XLogBytePosToRecPtr and XLogRecPtrToBytePos, which
doesn't seem to happen without modifying the two functions.

> ----
> + XLByteToSeg(restartLSN, restartSeg, wal_segment_size);
> +
> +
> + if (minKeepLSN)
> There is an extra empty line.
>
> ----
> + /* but, keep larger than wal_segment_size if any*/
> + if (wal_keep_segments > 0 && keepSegs < wal_keep_segments)
> + keepSegs = wal_keep_segments;
>
> You meant wal_keep_segments in the above comment rather than
> wal_segment_size? Also, the above comment need a whitespace just after
> "any".

Ouch! You're right. Fixed.

> ----
> +bool
> +IsLsnStillAvaiable(XLogRecPtr restartLSN, XLogRecPtr *minKeepLSN)
> +{
>
> I think restartLSN is a word used for replication slots. Since this
> function is defined in xlog.c it would be better to change the
> argument name to more generic name, for example recptr.

Agreed. I used "target" instead.

> ----
> + /*
> + * Calcualte keep segments by slots first. The second term of the
> + * condition is just a sanity check.
> + */
>
> s/calcualte/calculate/

Fixed.

> ----
> + /* get minimum segment ignorig timeline ID */
>
> s/ignorig/ignoring/

Fixed.

# One of my fingers is literally fatter with bandaid than usual..

> ----
> min_keep_lsn in pg_replication_slots currently shows the same value in
> every slots but I felt that the value seems not easy to understand
> intuitively for users because users will have to confirm that value
> and to compare the current LSN in order to check if replication slots
> will become the "lost" status. So how about showing values that
> indicate how far away from the point where we become "lost" for
> individual slots?

Yeah, that is what I did in the first cut of this patch from the
same thought. pg_replication_slots have two additional columns
"live" and "distance".

https://www.postgresql.org/message-id/20171031.184310.182012625.horiguchi.kyotaro@lab.ntt.co.jp

Thre current design is changed following a comment.

https://www.postgresql.org/message-id/20171108.131431.170534842.horiguchi.kyotaro%40lab.ntt.co.jp

> > I don't think 'distance' is a good metric - that's going to continually
> > change. Why not store the LSN that's available and provide a function
> > that computes this? Or just rely on the lsn - lsn operator?
>
> It seems reasonable.,The 'secured minimum LSN' is common among
> all slots so showing it in the view may look a bit stupid but I
> don't find another suitable place for it. distance = 0 meant the
> state that the slot is living but insecured in the previous patch
> and that information is lost by changing 'distance' to
> 'min_secure_lsn'.

As I reconsidered this, I noticed that "lsn - lsn" doesn't make
sense here. The correct formula for the value is
"max_slot_wal_keep_size * 1024 * 1024 - ((oldest LSN to keep) -
restart_lsn). It is not a simple formula to write by hand but
doesn't seem general enough. I re-changed my mind to show the
"distance" there again.

pg_replication_slots now has the column "remain" instaed of
"min_keep_lsn", which shows an LSN when wal_status is "streaming"
and otherwise "0/0". In a special case, "remain" can be "0/0"
while "wal_status" is "streaming". It is the reason for the
tristate return value of IsLsnStillAvaialbe().

wal_status | remain
streaming | 0/19E3C0 -- WAL is reserved
streaming | 0/0 -- Still reserved but on the boundary
keeping | 0/0 -- About to be lost.
lost | 0/0 -- Lost.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v5-0001-Add-WAL-releaf-vent-for-replication-slots.patch text/x-patch 6.5 KB
v5-0002-Add-monitoring-aid-for-max_slot_wal_keep_size.patch text/x-patch 11.9 KB
v5-0003-TAP-test-for-the-slot-limit-feature.patch text/x-patch 5.3 KB
v5-0004-Documentation-for-slot-limit-feature.patch text/x-patch 5.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2018-07-13 08:46:23 Re: pgbench's expression parsing & negative numbers
Previous Message Fabien COELHO 2018-07-13 08:39:53 Re: Constraint documentation