Re: [HACKERS] Restricting maximum keep segments by repslots

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: michael(at)paquier(dot)xyz
Cc: sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, andres(at)anarazel(dot)de
Subject: Re: [HACKERS] Restricting maximum keep segments by repslots
Date: 2018-12-20 07:24:38
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thank you for piking this and sorry being late.

At Mon, 19 Nov 2018 13:39:58 +0900, Michael Paquier <michael(at)paquier(dot)xyz> wrote in <20181119043958(dot)GE4400(at)paquier(dot)xyz>
> ereport should not be called within xlogreader.c as a base rule:

Ouch! I forgot that. Fixed to use report_invalid_record slightly
changing the message. The code is not required (or cannot be
used) on frontend so #ifndef FRONTENDed the code.

At Tue, 20 Nov 2018 14:07:44 +0900, Michael Paquier <michael(at)paquier(dot)xyz> wrote in <20181120050744(dot)GJ4400(at)paquier(dot)xyz>
> On Mon, Nov 19, 2018 at 01:39:58PM +0900, Michael Paquier wrote:
> > I was just coming by to look at bit at the patch series, and bumped
> > into that:
> So I have been looking at the last patch series 0001-0004 posted on this
> thread, and coming from here:
> /* check that the slot is gone */
> SELECT * FROM pg_replication_slots
> It could be an idea to switch to the expanded mode here, not that it
> matters much still..

No problem doing that. Done.

TAP test complains that it still uses recovery.conf. Fixed. On
the way doing that I added parameter primary_slot_name to
init_from_backup in

> +IsLsnStillAvaiable(XLogRecPtr targetLSN, uint64 *restBytes)
> You mean Available here, not Avaiable. This function is only used when
> scanning for slot information with pg_replication_slots, so wouldn't it
> be better to just return the status string in this case?

Mmm. Sure. Auto-completion hid it from my eyes. Fixed the name.
The fix sounds reasonable. The function was created as returning
boolean and the name doen't fit the current function. I renamed
the name to GetLsnAvailability() that returns a string.

> Not sure I see the point of the "remain" field, which can be found with
> a simple calculation using the current insertion LSN, the segment size
> and the amount of WAL that the slot is retaining. It may be interesting
> to document a query to do that though.

It's not that simple. wal_segment_size, max_slot_wal_keep_size,
wal_keep_segments, max_slot_wal_keep_size and the current LSN are
invoved in the calculation which including several conditional
branches, maybe as you see upthread. We could show "the largest
current LSN until WAL is lost" but the "current LSN" is not shown
there. So it is showing the "remain".

> GetOldestXLogFileSegNo() has race conditions if WAL recycling runs in
> parallel, no? How is it safe to scan pg_wal on a process querying
> pg_replication_slots while another process may manipulate its contents
> (aka the checkpointer or just the startup process with an
> end-of-recovery checkpoint.). This routine relies on unsafe
> assumptions as this is not concurrent-safe. You can avoid problems by
> making sure instead that lastRemovedSegNo is initialized correctly at
> startup, which would be normally one segment older than what's in
> pg_wal, which feels a bit hacky to rely on to track the oldest segment.

Concurrent recycling makes the function's result vary between the
segment numbers before and after it. It is unstable but doesn't
matter so much. The reason for the timing is to avoid extra
startup time by a scan over pg_wal that is unncecessary in most

Anyway the attached patch initializes lastRemovedSegNo in

> It seems to me that GetOldestXLogFileSegNo() should also check for
> segments matching the current timeline, no?

RemoveOldXlogFiles() ignores timeline and the function is made to
behave the same way (in different manner). I added a comment for
the behavior in the function.

> + if (prev_lost_segs != lost_segs)
> + ereport(WARNING,
> + (errmsg ("some replication slots have lost
> required WAL segments"),
> + errdetail_plural(
> + "The mostly affected slot has lost %ld
> segment.",
> + "The mostly affected slot has lost %ld
> segments.",
> + lost_segs, lost_segs)));
> This can become very noisy with the time, and it would be actually
> useful to know which replication slot is impacted by that.

One message per one segment doen't seem so noisy. The reason for
not showing slot identifier individually is just to avoid
complexity comes from involving slot details. DBAs will see the
details in pg_stat_replication.

Anyway I did that in the attached patch. ReplicationSlotsBehind
returns the list of the slot names that behind specified
LSN. With this patch the messages looks as the follows:

WARNING: some replication slots have lost required WAL segments
DETAIL: Slot s1 lost 8 segment(s).
WARNING: some replication slots have lost required WAL segments
DETAIL: Slots s1, s2, s3 lost at most 9 segment(s).

> + slot doesn't have valid restart_lsn, this field
> Missing a determinant here, and restart_lsn should have a <literal>
> markup.

structfield? Reworded as below:

| non-negative. If <structfield>restart_lsn</structfield> is NULL, this
| field is <literal>unknown</literal>.

I changed "the slot" with "this slot" in the two added fields
(wal_status, remain).

> + many WAL segments that they fill up the space allotted
> s/allotted/allocated/.


> + available. The last two states are seen only when
> + <xref linkend="guc-max-slot-wal-keep-size"/> is non-negative. If the
> + slot doesn't have valid restart_lsn, this field
> + is <literal>unknown</literal>.
> I am a bit confused by this statement. The last two states are "lost"
> and "keeping", but shouldn't "keeping" be the state showing up by
> default as it means that all WAL segments are kept around.

It's "streaming". I didn't came up with nice words to
distinguish the two states. I'm not sure "keep around" exactly
means but "keeping" here means rather "just not removed yet". The
states could be reworded as the follows:

streaming: kept/keeping/(secure, in the first version)
keeping : mortal/about to be removed
lost/unkown : (lost/unknown)

Do you have any better wording?

> +# Advance WAL by ten segments (= 160MB) on master
> +advance_wal($node_master, 10);
> +$node_master->safe_psql('postgres', "CHECKPOINT;");
> This makes the tests very costly, which is something we should avoid as
> much as possible. One trick which could be used here, on top of
> reducing the number of segment switches, is to use initdb
> --wal-segsize=1.

That sounds nice. Done. In the new version the number of segments
can be reduced and a new test item for the initial unkonwn state
as the first item.

Please find the attached new version.


Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v11-0001-Add-WAL-relief-vent-for-replication-slots.patch text/x-patch 9.8 KB
v11-0002-Add-monitoring-aid-for-max_slot_wal_keep_size.patch text/x-patch 12.6 KB
v11-0003-Add-primary_slot_name-to-init_from_backup-in-TAP-tes.patch text/x-patch 1.2 KB
v11-0004-TAP-test-for-the-slot-limit-feature.patch text/x-patch 7.1 KB
v11-0005-Documentation-for-slot-limit-feature.patch text/x-patch 4.5 KB
v11-0006-Check-removal-of-in-reading-segment-file.patch text/x-patch 2.3 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Ideriha, Takeshi 2018-12-20 07:56:16 RE: Protect syscache from bloating with negative cache entries
Previous Message Gavin Flower 2018-12-20 07:17:28 Re: Using POPCNT and other advanced bit manipulation instructions