Re: [HACKERS] Restricting maximum keep segments by repslots

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, sk(at)zsrv(dot)org, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject: Re: [HACKERS] Restricting maximum keep segments by repslots
Date: 2018-07-24 07:47:41
Message-ID: CAD21AoD0rChq7wQE=_o95quopcQGjcVG9omwdH07nT5cm81hzg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 23, 2018 at 4:16 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello.
>
> At Fri, 20 Jul 2018 10:13:58 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoDayePWwu4t=VPP5P1QFDSBvks1d8j76bXp5rbXoPbZcA(at)mail(dot)gmail(dot)com>
>> > As I reconsidered this, I noticed that "lsn - lsn" doesn't make
>> > sense here. The correct formula for the value is
>> > "max_slot_wal_keep_size * 1024 * 1024 - ((oldest LSN to keep) -
>> > restart_lsn). It is not a simple formula to write by hand but
>> > doesn't seem general enough. I re-changed my mind to show the
>> > "distance" there again.
>> >
>> > pg_replication_slots now has the column "remain" instaed of
>> > "min_keep_lsn", which shows an LSN when wal_status is "streaming"
>> > and otherwise "0/0". In a special case, "remain" can be "0/0"
>> > while "wal_status" is "streaming". It is the reason for the
>> > tristate return value of IsLsnStillAvaialbe().
>> >
>> > wal_status | remain
>> > streaming | 0/19E3C0 -- WAL is reserved
>> > streaming | 0/0 -- Still reserved but on the boundary
>> > keeping | 0/0 -- About to be lost.
>> > lost | 0/0 -- Lost.
>> >
>>
>> The "remain" column still shows same value at all rows as follows
>> because you always compare between the current LSN and the minimum LSN
>> of replication slots. Is that you expected? My comment was to show the
>
> Ouch! Sorry for the silly mistake. GetOldestKeepSegment should
> calculate restBytes based on the distance from the cutoff LSN to
> restart_lsn, not to minSlotLSN. The attached fixed v6 correctly
> shows the distance individually.
>
>> Also, I'm not sure it's a good way to show the distance as LSN. LSN is
>> a monotone increasing value but in your patch, a value of the "remain"
>> column can get decreased. As an alternative way I'd suggest to show it
>
> The LSN of WAL won't be decreased but an LSN is just a position
> in a WAL stream. Since the representation of LSN is composed of
> the two components 'file number' and 'offset', it's quite natural
> to show the difference in the same unit. The distance between the
> points at "6m" and "10m" is "4m".
>
>> as the number of segments. Attached patch is a patch for your v5 patch
>> that changes it so that the column shows how many WAL segments of
>> individual slots are remained until they get lost WAL.
>
> Segment size varies by configuration, so segment number is not
> intuitive to show distance. I think it is the most significant
> reason we move to "bytes" from "segments" about WAL sizings like
> max_wal_size. More than anything, it's too coarse. The required
> segments may be lasts for the time to consume a whole segment or
> may be removed just after. We could calculate the fragment bytes
> but it requires some internal knowledge.
>
> Instead, I made the field be shown in flat "bytes" using bigint,
> which can be nicely shown using pg_size_pretty;

Thank you for updating. I agree showing the remain in bytes.

Here is review comments for v6 patch.

@@ -967,9 +969,9 @@ postgres=# SELECT * FROM
pg_create_physical_replication_slot('node_a_slot');
node_a_slot |

postgres=# SELECT * FROM pg_replication_slots;
- slot_name | slot_type | datoid | database | active | xmin |
restart_lsn | confirmed_flush_lsn
--------------+-----------+--------+----------+--------+------+-------------+---------------------
- node_a_slot | physical | | | f | | |
+ slot_name | slot_type | datoid | database | active | xmin |
restart_lsn | confirmed_flush_lsn | wal_status | min_keep_lsn
+-------------+-----------+--------+----------+--------+------+-------------+---------------------+------------+--------------
+ node_a_slot | physical | | | f | |
| | unknown | 0/1000000

This funk should be updated.

-----
+/*
+ * Returns minimum segment number the next checktpoint must leave considering
+ * wal_keep_segments, replication slots and max_slot_wal_keep_size.
+ *
+ * If resetBytes is not NULL, returns remaining LSN bytes to advance until any
+ * slot loses reserving a WAL record.
+ */
+static XLogSegNo
+GetOldestKeepSegment(XLogRecPtr currLSN, XLogRecPtr minSlotLSN,
XLogRecPtr restartLSN, uint64 *restBytes)
+{

You're assuming that the minSlotLSN is the minimum LSN of replication
slots but it's not mentioned anywhere. Since you check minSlotSeg <=
currSeg but not force it, if a caller sets a wrong value to minSlotLSN
this function will return a wrong value with no complaints. Similarly
there is not explanation about the resetartLSN, so you can add it. I'm
not sure the augment name restartLSN is suitable for the function in
xlog.c but I'd defer it to committers.

Since this function assumes that both restartLSN and *restBytes are
valid or invalid (and NULL) it's better to add assertions for safety.
The current code accepts even the case where only either argment is
valid.

-----
+ if (limitSegs > 0 && currSeg <= restartSeg + limitSegs)
+ {
+ /*
+ * This slot still has all required segments.
Calculate how many
+ * LSN bytes the slot has until it loses restart_lsn.
+ */
+ fragbytes = wal_segment_size - (currLSN %
wal_segment_size);
+ *restBytes =
+ (restartSeg + limitSegs - currSeg) *
wal_segment_size
+ + fragbytes;
+ }
+ }

This code doesn't consider the case where wal_keep_segments >
max_slot_keep_size. In the case I think we should use (currSeg -
wal_keep_segments) as the lower bound in order to avoid showing
"streaming" in the wal_status although the remain is 0.

-----
+ *restBytes =
+ (restartSeg + limitSegs - currSeg) *
wal_segment_size
+ + fragbytes;

Maybe you can use XLogSegNoOffsetToRecPtr instead.

-----
+ * 0 means that WAL record at targetLSN is alredy removed.
+ * 1 means that WAL record at tagetLSN is availble.
+ * 2 means that WAL record at tagetLSN is availble but about to be removed by

s/alredy/already/
s/tagetLSN/targetLSN/
s/availble/available/

-----
+ * If resetBytes is not NULL, returns remaining LSN bytes to advance until any
+ * slot loses reserving a WAL record.

s/resetBytes/restBytes/

-----
+ Specify the maximum size of WAL files
+ that <link linkend="streaming-replication-slots">replication
+ slots</link> are allowed to reatin in the <filename>pg_wal</filename>

s/reatin/retain/

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message martinwerfel12 2018-07-24 07:52:06 Summary of plans to avoid the annoyance of Freezing
Previous Message David Rowley 2018-07-24 07:16:24 Re: FailedAssertion on partprune