Re: [HACKERS] Restricting maximum keep segments by repslots

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Sergei Kornilov <sk(at)zsrv(dot)org>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: [HACKERS] Restricting maximum keep segments by repslots
Date: 2018-10-29 08:37:52
Message-ID: CAD21AoDz2nUu-JyLxp5UTO19074C7bGdYJV-vDWLkaRxLAy8+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 25, 2018 at 9:56 PM Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
> Hello.
>
> At Mon, 22 Oct 2018 19:35:04 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoBdfoLSgujPZ_TpnH5zdQz0jg-Y8OXtZ=TCO787Sey-=w(at)mail(dot)gmail(dot)com>
> > On Thu, Sep 13, 2018 at 6:30 PM Kyotaro HORIGUCHI
> > <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > Sorry for the late response. The patch still can be applied to the
>
> It's alright. Thanks.
>
> > curent HEAD so I reviewed the latest patch.
> > The value of 'remain' and 'wal_status' might not be correct. Although
> > 'wal_stats' shows 'lost' but we can get changes from the slot. I've
> > tested it with the following steps.
> >
> > =# alter system set max_slot_wal_keep_size to '64MB'; -- while
> > wal_keep_segments is 0
> > =# select pg_reload_conf();
> > =# select slot_name, wal_status, remain, pg_size_pretty(remain) as
> > remain_pretty from pg_replication_slots ;
> > slot_name | wal_status | remain | remain_pretty
> > -----------+------------+----------+---------------
> > 1 | streaming | 83885648 | 80 MB
> > (1 row)
> >
> > ** consume 80MB WAL, and do CHECKPOINT **
> >
> > =# select slot_name, wal_status, remain, pg_size_pretty(remain) as
> > remain_pretty from pg_replication_slots ;
> > slot_name | wal_status | remain | remain_pretty
> > -----------+------------+--------+---------------
> > 1 | lost | 0 | 0 bytes
> > (1 row)
> > =# select count(*) from pg_logical_slot_get_changes('1', NULL, NULL);
> > count
> > -------
> > 15
> > (1 row)
>
> Mmm. The function looks into the segment already open before
> losing the segment in the file system (precisely, its direcotory
> entry has been deleted). So just 1 lost segment doesn't
> matter. Please try losing more one segment.
>
> =# select * from pg_logical_slot_get_changes('s1', NULL, NULL);
> ERROR: unexpected pageaddr 0/29000000 in log segment 000000010000000000000023, offset 0
>
> Or, instead just restarting will let the opened segment forgotten.
>
> ...
> > 1 | lost | 0 | 0 bytes
> (just restart)
> > =# select * from pg_logical_slot_get_changes('s1', NULL, NULL);
> > ERROR: requested WAL segment pg_wal/000000010000000000000029 has already been removed
>
> I'm not sure this is counted to be a bug...
>
>
> > -----
> > I got the following result with setting of wal_keep_segments >
> > max_slot_keep_size. The 'wal_status' shows 'streaming' although the
> > 'remain' is 0.
> >
> > =# select slot_name, wal_status, remain from pg_replication_slots limit 1;
> > slot_name | wal_status | remain
> > -----------+------------+--------
> > 1 | streaming | 0
> > (1 row)
> >
> > + XLByteToSeg(targetLSN, restartSeg, wal_segment_size);
> > + if (max_slot_wal_keep_size_mb >= 0 && currSeg <=
> > restartSeg + limitSegs)
> > + {
> >
> > You use limitSegs here but shouldn't we use keepSeg instead? Actually
> > I've commented this point for v6 patch before[1], and this had been
> > fixed in the v7 patch. However you're using limitSegs again from v8
> > patch again. I might be missing something though.
>
> No. keepSegs is the number of segments *actually* kept around. So
> reverting it to keptSegs just resurrects the bug you pointed
> upthread. What needed here is at most how many segments will be
> kept. So raising limitSegs by wal_keep_segments fixes that.
> Sorry for the sequence of silly bugs. TAP test for the case
> added.
>

Thank you for updating the patch. The 0001 - 0004 patches work fine
and looks good to me except for the following comment for the code.

+ /*
+ * Calculate keep segments by slots first. The second term of the
+ * condition is just a sanity check.
+ */
+ if (minSlotLSN != InvalidXLogRecPtr && minSlotSeg <= currSeg)
+ keepSegs = currSeg - minSlotSeg;

I think that we can use assertion of the second term of the condition
instead of just checking. If the function get minSlotSeg > currSeg the
return value will be incorrect. That means that the function requires
the condition is always true. Thought?

Since this comment can be deferred to committers I've marked this
patch as "Ready for Committer". For 0005 patch the issue I reported is
a relatively rare issue and is not critical, we can discuss it after
this patch gets committed.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Laurenz Albe 2018-10-29 08:41:08 Re: pg_promote not marked as parallel-restricted in pg_proc.dat
Previous Message Michael Paquier 2018-10-29 08:25:30 pg_promote not marked as parallel-restricted in pg_proc.dat