Re: prevent immature WAL streaming

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: alvherre(at)alvh(dot)no-ip(dot)org
Cc: masao(dot)fujii(at)oss(dot)nttdata(dot)com, andres(at)anarazel(dot)de, pgsql-hackers(at)lists(dot)postgresql(dot)org, bossartn(at)amazon(dot)com, mengjuan(dot)cmj(at)alibaba-inc(dot)com, Jakub(dot)Wartak(at)tomtom(dot)com
Subject: Re: prevent immature WAL streaming
Date: 2021-09-03 07:09:04
Message-ID: 20210903.160904.904317102157226316.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Thu, 2 Sep 2021 18:43:33 -0400, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote in
> On 2021-Sep-02, Kyotaro Horiguchi wrote:
>
> > So, this is a crude PoC of that.
>
> I had ended up with something very similar, except I was trying to cram
> the flag via the checkpoint record instead of hacking
> AdvanceXLInsertBuffer(). I removed that stuff and merged both, here's
> the result.
>
> > 1. This patch is written on the current master, but it doesn't
> > interfare with the seg-boundary-memorize patch since it removes the
> > calls to RegisterSegmentBoundary.
>
> I rebased on top of the revert patch.

Thanks!

> > 2. Since xlogreader cannot emit a log-message immediately, we don't
> > have a means to leave a log message to inform recovery met an
> > aborted partial continuation record. (In this PoC, it is done by
> > fprintf:p)
>
> Shrug. We can just use an #ifndef FRONTEND / elog(LOG). (I didn't keep
> this part, sorry.)

No problem, it was mere a develop-time message for behavior
observation.

> > 3. Myebe we need to pg_waldump to show partial continuation records,
> > but I'm not sure how to realize that.
>
> Ah yes, we'll need to fix that.

I just believe 0001 does the right thing.

0002:
> + XLogRecPtr abortedContrecordPtr; /* LSN of incomplete record at end of
> + * WAL */

The name sounds like the start LSN. doesn't contrecordAbort(ed)Ptr work?

> if (!(pageHeader->xlp_info & XLP_FIRST_IS_CONTRECORD))
> {
> report_invalid_record(state,
> "there is no contrecord flag at %X/%X",
> LSN_FORMAT_ARGS(RecPtr));
> - goto err;
> + goto aborted_contrecord;

This loses the exclusion check between XLP_FIRST_IS_CONTRECORD and
_IS_ABROTED_PARTIAL. Is it okay? (I don't object to remove the check.).

I didin't thought this as an aborted contrecord. but on second
thought, when we see a record broken in any style, we stop recovery at
the point. I agree to the change and all the silmiar changes.

+ /* XXX should we goto aborted_contrecord here? */

I think it should be aborted_contrecord.

When that happens, the loaded bytes actually looked like the first
fragment of a continuation record to xlogreader, even if the cause
were a broken total_len. So if we abort the record there, the next
time xlogreader will meet XLP_FIRST_IS_ABORTED_PARTIAL at the same
page, and correctly finds a new record there.

On the other hand if we just errored-out there, we will step-back to
the beginning of the broken record in the previous page or segment
then start writing a new record there but that is exactly what we want
to avoid now.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2021-09-03 07:49:34 Re: Improve logging when using Huge Pages
Previous Message Michael Paquier 2021-09-03 07:03:36 Re: Unused variable in TAP tests file