Re: [HACKERS] Restricting maximum keep segments by repslots

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: sawada(dot)mshk(at)gmail(dot)com
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, andres(at)anarazel(dot)de, peter(dot)eisentraut(at)2ndquadrant(dot)com
Subject: Re: [HACKERS] Restricting maximum keep segments by repslots
Date: 2018-07-09 05:47:06
Message-ID: 20180709.144706.258526585.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello. Sawada-san.

Thank you for the comments.

At Thu, 5 Jul 2018 15:43:56 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoDiiA4qHj0thqw80Bt=vefSQ9yGpZnr0kuLTXszbrV9iQ(at)mail(dot)gmail(dot)com>
> On Wed, Jul 4, 2018 at 5:28 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > Hello.
> >
> > At Tue, 26 Jun 2018 16:26:59 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20180626(dot)162659(dot)223208514(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> >> The previous patche files doesn't have version number so I let
> >> the attached latest version be v2.
> >>
> >>
> >> v2-0001-Add-WAL-releaf-vent-for-replication-slots.patch
> >> The body of the limiting feature
> >>
> >> v2-0002-Add-monitoring-aid-for-max_replication_slots.patch
> >> Shows the status of WAL rataining in pg_replication_slot view
> >>
> >> v2-0003-TAP-test-for-the-slot-limit-feature.patch
> >> TAP test for this feature
> >>
> >> v2-0004-Documentation-for-slot-limit-feature.patch
> >> Documentation, as the name.
> >
> > Travis (test_decoding test) showed that GetOldestXLogFileSegNo
> > added by 0002 forgets to close temporarily opened pg_wal
> > directory. This is the fixed version v3.
> >
>
> Thank you for updating the patch! I looked at v3 patches. Here is
> review comments.
>
> ---
> + {"max_slot_wal_keep_size", PGC_SIGHUP, REPLICATION_SENDING,
> + gettext_noop("Sets the maximum size of extra
> WALs kept by replication slots."),
> + NULL,
> + GUC_UNIT_MB
> + },
> + &max_slot_wal_keep_size_mb,
> + 0, 0, INT_MAX,
> + NULL, NULL, NULL
> + },
>
> Maybe MAX_KILOBYTES would be better instead of INT_MAX like wal_max_size.

The MAX_KILOBYTES is maximum value of size in kB, which fits long
or Size/size_t variables after convreted into bytes. Applying the
limit there means that we assume that the _mb variable can be
converted not into bytes but kB. So applying it to
max/min_wal_size seems somewhat wrong but doesn't harm since they
are not acutually converted into bytes.

max_slot_wal_keep_size is not converted into bytes so capping
with INT_MAX is no problem. However it doesn't need to be larger
than MAX_KILOBYTES, I follow that in order to make it same with
max/min_wal_size.

> ---
> Once the following WARNING emitted this message is emitted whenever we
> execute CHECKPOINT even if we don't change anything. Is that expected
> behavior? I think it would be better to emit this message only when we
> remove WAL segements that are required by slots.
>
> WARNING: some replication slots have lost required WAL segments
> DETAIL: The mostly affected slot has lost 153 segments.

I didn't consider the situation the number of lost segments
doesn't change. Changed to mute the message when the number of
lost segments is not changed.

> ---
> + Assert(wal_keep_segments >= 0);
> + Assert(max_slot_wal_keep_size_mb >= 0);
>
> These assertions are not meaningful because these parameters are
> ensured >= 0 by those definition.

Yeah, that looks a bit being paranoid. Removed.

> ---
> + /* slots aren't useful, consider only wal_keep_segments */
> + if (slotpos == InvalidXLogRecPtr)
> + {
>
> Perhaps XLogRecPtrIsInvalid(slotpos) is better.

Agreed. It is changed to "slotpos != InvalidXLogRecPtr" after
changing the function by the comments below. I think that the
double negation !XLogRecPtrInvalid() is not fine.

> ---
> + if (slotpos != InvalidXLogRecPtr && currSeg <= slotSeg + wal_keep_segments)
> + slotpos = InvalidXLogRecPtr;
> +
> + /* slots aren't useful, consider only wal_keep_segments */
> + if (slotpos == InvalidXLogRecPtr)
> + {
>
> This logic is hard to read to me. The slotpos can be any of: valid,
> valid but then become invalid in halfway or invalid from beginning of
> this function. Can we convert this logic to following?
>
> if (XLogRecPtrIsInvalid(slotpos) ||
> currSeg <= slotSeg + wal_keep_segments)

Right. But it is removed.

> ---
> + keepSegs = wal_keep_segments +
> + ConvertToXSegs(max_slot_wal_keep_size_mb, wal_segment_size);
>
> Why do we need to keep (wal_keep_segment + max_slot_wal_keep_size) WAL
> segments? I think what this feature does is, if wal_keep_segments is
> not useful (that is, slotSeg < (currSeg - wal_keep_segment) then we
> normally choose slotSeg as lower boundary but max_slot_wal_keep_size
> restrict the lower boundary so that it doesn't get lower than the
> threshold. So I thought what this function should do is to calculate
> min(currSeg - wal_keep_segment, max(currSeg - max_slot_wal_keep_size,
> slotSeg)), I might be missing something though.

You're right that wal_keep_segments should not been added, but
should give lower limit of segments to keep as the current
KeepLogSeg() does. Fixed that.

Since the amount is specified in mega bytes, silently rounding
down to segment bounds may not be proper in general and this
feature used to use the fragments to show users something. But
there's no loner a place where the fragments are perceptible to
users and anyway the fragments are way smaller than the expected
total WAL size.

As the result, I removed the fragment calculation at all as you
suggested. It gets way smaller and simpler.

> ---
> + SpinLockAcquire(&XLogCtl->info_lck);
> + oldestSeg = XLogCtl->lastRemovedSegNo;
> + SpinLockRelease(&XLogCtl->info_lck);
>
> We can use XLogGetLastRemovedSegno() instead.

It is because I thought that it is for external usage,
spcifically by slot.c since CheckXLogRemoved() is reading it
directly. I leave it alone and they would have to be fixed at
once if we decide to use it internally.

> ---
> + xldir = AllocateDir(XLOGDIR);
> + if (xldir == NULL)
> + ereport(ERROR,
> + (errcode_for_file_access(),
> + errmsg("could not open write-ahead log directory \"%s\": %m",
> + XLOGDIR)));
>
> Looking at other code allocating a directory we don't check xldir ==
> NULL because it will be detected by ReadDir() function and raise an
> error in that function. So maybe we don't need to check it just after
> allocation.

Thanks. I found that in the comment of ReadDir(). This doesn't
need a special error handling so I leave it to ReadDir there.

Addition to that, documentation is fixed.

Attached is the v4 files.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v4-0001-Add-WAL-releaf-vent-for-replication-slots.patch text/x-patch 6.5 KB
v4-0002-Add-monitoring-aid-for-max_replication_slots.patch text/x-patch 8.7 KB
v4-0003-TAP-test-for-the-slot-limit-feature.patch text/x-patch 5.3 KB
v4-0004-Documentation-for-slot-limit-feature.patch text/x-patch 5.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Haribabu Kommi 2018-07-09 05:57:58 Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query
Previous Message Haribabu Kommi 2018-07-09 05:39:34 Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query