Re: [HACKERS] Restricting maximum keep segments by repslots

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: alvherre(at)2ndquadrant(dot)com
Cc: jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject: Re: [HACKERS] Restricting maximum keep segments by repslots
Date: 2020-03-31 05:20:16
Message-ID: 20200331.142016.971756479150607227.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for looking this and trouble rebasing!

At Mon, 30 Mar 2020 20:03:27 -0300, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote in
> I rebased this patch; it's failing to apply due to minor concurrent
> changes in PostgresNode.pm. I squashed the patches in a series that
> made the most sense to me.
>
> I have a question about static variable lastFoundOldestSeg in
> FindOldestXLogFileSegNo. It may be set the first time the function
> runs; if it is, the function never again does anything, it just returns
> that value. In other words, the static value is never reset; it never
> advances either. Isn't that strange? I think the coding is to assume
> that XLogCtl->lastRemovedSegNo will always be set, so its code will
> almost never run ... except when the very first wal file has not been
> removed yet. This seems weird and pointless. Maybe we should think
> about this differently -- example: if XLogGetLastRemovedSegno returns
> zero, then the oldest file is the zeroth one. In what cases this is
> wrong? Maybe we should fix those.

That's right, but without the static variable, every call to the
pg_replication_slots view before the fist checkpoint causes scanning
pg_xlog. XLogCtl->lastRemovedSegNo advances only at a checkpoint, so
it is actually right that the return value from
FindOldestXLogFileSegNo doesn't change until the first checkpoint.

Also we could set XLogCtl->lastRemovedSegNo at startup, but the
scanning on pg_xlog is useless in most cases.

I avoided to update the XLogCtl->lastRemovedSegNo directlry, but the
third way would be if XLogGetLastRemovedSegno() returned 0, then set
XLogCtl->lastRemovedSegNo by scanning the WAL directory. The attached
takes this way.

> Regarding the PostgresNode change in 0001, I think adding a special
> parameter for primary_slot_name is limited. I'd like to change the
> definition so that anything that you give as a parameter that's not one
> of the recognized keywords (has_streaming, etc) is tested to see if it's
> a GUC; and if it is, then put it in postgresql.conf. This would have to
> apply both to PostgresNode::init() as well as
> PostgresNode::init_from_backup(), obviously, since it would make no
> sense for the APIs to diverge on this point. So you'd be able to do
> $node->init_from_backup(allow_streaming => 1, work_mem => "4MB");
> without having to add code to init_from_backup to handle work_mem
> specifically. This could be done by having a Perl hash with all the GUC
> names, that we could read lazily from "postmaster --describe-config" the
> first time we see an unrecognized keyword as an option to init() /
> init_from_backup().

Done that way. We could exclude "known" parameters by explicitly
delete the key at reading it, but I choosed to enumerate the known
keywords. Although it can be used widely but actually I changed only
018_repslot_limit.pl to use the feature.

> I edited the doc changes a bit.
>
> I don't know what to think of 0003 yet. Has this been agreed to be a
> good idea?

So it was a separate patch. I think it has not been approved nor
rejected. The main objective of the patch is preventing
pg_replication_slots.wal_status from strange coming back from the
"lost" state to other states. However, in the first place I doubt that
it's right that logical replication sends the content of a WAL segment
already recycled.

> I also made a few small edits to the code; all cosmetic so far:
>
> * added long_desc to the new GUC; it now reads:
>
> {"max_slot_wal_keep_size", PGC_SIGHUP, REPLICATION_SENDING,
> gettext_noop("Sets the maximum size of WAL space reserved by replication slots."),
> gettext_noop("Replication slots will be marked as failed, and segments released "
> "for deletion or recycling, if this much space is occupied by WAL "
> "on disk."),
>
> * updated the comment to ConvertToXSegs() which is now being used for
> this purpose
>
> * remove outdated comment to GetWalAvailability; it was talking about
> restBytes parameter that no longer exists

Thank you for the fixes. All of the looks fine.

I fixed several typos. (s/requred/required/, s/devinitly/definitely/,
s/errror/error/)

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v20-0001-Allow-arbitrary-GUC-parameter-setting-init-and-i.patch text/x-patch 2.7 KB
v20-0002-Add-WAL-relief-vent-for-replication-slots.patch text/x-patch 35.5 KB
v20-0003-Allow-init-and-init_from_backup-to-set-arbitrary.patch text/x-patch 2.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2020-03-31 05:36:03 Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction
Previous Message Masahiko Sawada 2020-03-31 05:13:34 Re: pg_stat_statements issue with parallel maintenance (Was Re: WAL usage calculation patch)