Re: BUG #17903: There is a bug in the KeepLogSeg()

From: Junwang Zhao <zhjwpku(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: xu(dot)xw2008(at)163(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17903: There is a bug in the KeepLogSeg()
Date: 2023-04-20 07:40:14
Message-ID: CAEG8a3L2BB5k9USO4fC_wQjULGX-1rqaX2JSHn1_0vco8KB28Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

If `segno` can be larger than `currSegNo`, your patch seems to miss
the following branch,
are you missing this for some particular reason?

```
/* but, keep at least wal_keep_size if that's set */
if (wal_keep_size_mb > 0)
{
uint64 keep_segs;

keep_segs = ConvertToXSegs(wal_keep_size_mb, wal_segment_size);
if (currSegNo - segno < keep_segs) <<<< see here
{
/* avoid underflow, don't go below 1 */
if (currSegNo <= keep_segs)
segno = 1;
else
segno = currSegNo - keep_segs;
}
}
```

On Thu, Apr 20, 2023 at 11:04 AM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> At Wed, 19 Apr 2023 10:26:13 +0000, PG Bug reporting form <noreply(at)postgresql(dot)org> wrote in
> > I found that KeepLogSeg() has a piece of code that is not correctly.
> >
> > segno may be larger than currSegNo, since the slot_keep_segs variable is of
> > type "uint64", in this case the code "if (currSegNo - segno >
> > slot_keep_segs)" is incorrect.
> >
> > "if (currSegNo - segno < keep_segs)" is also the same.
> >
> > Checkpoint calls the KeepLogSeg function, and there are many operations
> > between recptr and XLogGetReplicationSlotMinimumLSN, including updating the
> > pg_control file, so segno may be larger than currSegNo.
>
> Correct. Thanks for the report.
>
> If checkpointer somehow takes a long time between inserting a
> checkpoint record and removing WAL files, while replication advances a
> certain distnace, it can actually happen. Although that behavior
> doesn't directly affect max_slot_wal_keep_size, it does disrupt the
> effect of wal_keep_size.
>
> The thinko was that we incorrectly assumed the slot minimum LSN can't
> be larger than the checkpoint record LSN. We don't need to consider
> max_slot_wal_keep_size if the slot minimum LSN is already larger than
> currSegNo.
>
> The attached fix works. However, I can't come up with a reasonable
> testing script.
>
> This dates back to 13, where max_slot_wal_keep_size was introduced.
>
> regards.
>
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center

--
Regards
Junwang Zhao

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Daniel Gustafsson 2023-04-20 08:50:27 Re: pg_basebackup: errors on macOS on directories with ".DS_Store" files
Previous Message Daniel Gustafsson 2023-04-20 07:34:12 Re: pg_basebackup: errors on macOS on directories with ".DS_Store" files