Re: [HACKERS] Restricting maximum keep segments by repslots

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Sergei Kornilov <sk(at)zsrv(dot)org>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject: Re: [HACKERS] Restricting maximum keep segments by repslots
Date: 2018-09-12 04:45:32
Message-ID: CAD21AoAg_Rnbc_6EQJ6-Q+1kdzym9vjEZxhSAZOhjYPh5-921g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 10, 2018 at 7:19 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello.
>
> At Thu, 6 Sep 2018 19:55:39 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoAZCdvdMN-vG4D_653vb_FN-AaMAP5+GXgF1JRjy+LeyA(at)mail(dot)gmail(dot)com>
>> On Thu, Sep 6, 2018 at 4:10 PM, Kyotaro HORIGUCHI
>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> > Thank you for the comment.
>> >
>> > At Wed, 5 Sep 2018 14:31:10 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoB-HJvL+uKsv40Gb8Dymh9uBBQUXTucqv4MDtH_AGKh4g(at)mail(dot)gmail(dot)com>
>> >> On Tue, Sep 4, 2018 at 7:52 PM, Kyotaro HORIGUCHI
>> >> Thank you for updating! Here is the review comment for v8 patch.
>> >>
>> >> + /*
>> >> + * This slot still has all required segments. Calculate how many
>> >> + * LSN bytes the slot has until it loses restart_lsn.
>> >> + */
>> >> + fragbytes = wal_segment_size - (currLSN % wal_segment_size);
>> >> + XLogSegNoOffsetToRecPtr(restartSeg + limitSegs - currSeg,
>> >> fragbytes,
>> >> + wal_segment_size, *restBytes);
>> >>
>> >> For the calculation of fragbytes, I think we should calculate the
>> >> fragment bytes of restartLSN instead. The the formula "restartSeg +
>> >> limitSegs - currSeg" means the # of segment between restartLSN and the
>> >> limit by the new parameter. I don't think that the summation of it and
>> >> fragment bytes of currenLSN is correct. As the following result
>> >> (max_slot_wal_keep_size is 128MB) shows, the remain column shows the
>> >> actual remains + 16MB (get_bytes function returns the value of
>> >> max_slot_wal_keep_size in bytes).
>> >
>> > Since a oldest segment is removed after the current LSN moves to
>> > the next segmen, current LSN naturally determines the fragment
>> > bytes. Maybe you're concerning that the number of segments looks
>> > too much by one segment.
>> >
>> > One arguable point of the feature is how max_slot_wal_keep_size
>> > works exactly. I assume that even though the name is named as
>> > "max_", we actually expect that "at least that bytes are
>> > kept". So, for example, with 16MB of segment size and 50MB of
>> > max_s_w_k_s, I designed this so that the size of preserved WAL
>> > doesn't go below 50MB, actually (rounding up to multples of 16MB
>> > of 50MB), and loses the oldest segment when it reaches 64MB +
>> > 16MB = 80MB as you saw.
>> >
>> > # I believe that the difference is not so significant since we
>> > # have around hunderd or several hundreds of segments in common
>> > # cases.
>> >
>> > Do you mean that we should define the GUC parameter literally as
>> > "we won't have exactly that many bytes of WAL segmetns"? That is,
>> > we have at most 48MB preserved WAL records for 50MB of
>> > max_s_w_k_s setting. This is the same to how max_wal_size is
>> > counted but I don't think max_slot_wal_keep_size will be regarded
>> > in the same way.
>>
>> I might be missing something but what I'm expecting to this feature is
>> to restrict the how much WAL we can keep at a maximum for replication
>> slots. In other words, the distance between the current LSN and the
>> minimum restart_lsn of replication slots doesn't over the value of
>> max_slot_wal_keep_size.
>
> Yes, it's one possible design, the same with "we won't have more
> than exactly that many bytes of WAL segmetns" above ("more than"
> is added, which I meant). But anyway we cannot keep the limit
> strictly since WAL segments are removed only at checkpoint
> time.

Agreed. It should be something like a soft limit.

> So If doing so, we can reach the lost state before the
> max_slot_wal_keep_size is filled up meanwhile WAL can exceed the
> size by a WAL flood. We can define it precisely at most as "wal
> segments are preserved at most aorund the value". So I choosed
> the definition so that we can tell about this as "we don't
> guarantee more than that bytes".

Agreed.

>
> # Uuuu. sorry for possiblly hard-to-read sentence..
>
>> It's similar to wal_keep_segments except for
>> that this feature affects only replication slots. And
>
> It defines the *extra* segments to be kept, that is, if we set it
> to 2, at least 3 segments are present. If we set
> max_slot_wal_keep_size to 32MB (= 2 segs here), we have at most 3
> segments since 32MB range before the current LSN almost always
> spans over 3 segments. Doesn't this seemingly in a similar way
> with wal_keep_segments

Yeah, that's fine with me. The wal_keep_segments works regardless
existence of replication slots. If we have replication slots and set
both settings we can reserve extra WAL as much as
max(wal_keep_segments, max_slot_wal_keep_size).

>
> If the current LSN is at the very last of a segment and
> restart_lsn is catching up to the current LSN, the "remain" is
> equal to max_slot_wal_keep_size as the guaranteed size. If very
> beginning of a segments, it gets extra 16MB.

Agreed.

>
>> wal_keep_segments cannot restrict WAL that replication slots are
>> holding. For example, with 16MB of segment size and 50MB of
>> max_slot_wal_keep_size, we can keep at most 50MB WAL for replication
>> slots. However, once we consumed more than 50MB WAL while not
>> advancing any restart_lsn the required WAL might be lost by the next
>> checkpoint, which depends on the min_wal_size.
>
> I don't get the last phrase. With small min_wal_size, we don't
> recycle most of the "removed" segments. If large, we recycle more
> of them. It doesn't affect up to where the checkpoint removes WAL
> files. But it is right that LSN advance with
> max_slot_wal_keep_size bytes immediately leands to breaking a
> slot and it is intended behavior.

Sorry I was wrong. Please ignore the last sentence. What I want to say
is that there is no guarantees that the required WAL is kept once the
reserved extra WAL by replication slots exceeds the threshold.

>
>> On the other hand, if
>> we mostly can advance restart_lsn to approximately the current LSN the
>> size of preserved WAL for replication slots can go below 50MB.
>
> Y..eah.. That's right. It is just how this works. But I don't
> understand how this is related to the intepretation of the "max"
> of the GUC variable.

When I wrote this I understood that the following sentence is that we
regularly keep at least max_slot_wal_keep_size byte regardless the
progress of the minimum restart_lsn, I might have been
misunderstanding though.

>> > I assume that even though the name is named as
>> > "max_", we actually expect that "at least that bytes are
>> > kept".

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2018-09-12 05:14:41 Re: [HACKERS] Secondary index access optimizations
Previous Message Pavel Stehule 2018-09-12 04:11:20 Re: Retrieve memory size allocated by libpq