Re: prevent immature WAL streaming

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amul Sul <sulamul(at)gmail(dot)com>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, "Jakub(dot)Wartak(at)tomtom(dot)com" <Jakub(dot)Wartak(at)tomtom(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Ryo Matsumura <matsumura(dot)ryo(at)fujitsu(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "masao(dot)fujii(at)oss(dot)nttdata(dot)com" <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, "mengjuan(dot)cmj(at)alibaba-inc(dot)com" <mengjuan(dot)cmj(at)alibaba-inc(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: prevent immature WAL streaming
Date: 2021-10-13 17:39:33
Message-ID: 202110131739.nldwmphytqcz@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021-Oct-13, Robert Haas wrote:

> On Wed, Oct 13, 2021 at 2:01 AM Amul Sul <sulamul(at)gmail(dot)com> wrote:
> > Instead of abortedRecPtr point, isn't enough to write
> > overwrite-contrecord at XLogCtl->lastReplayedEndRecPtr? I think both
> > are pointing to the same location then can't we use
> > lastReplayedEndRecPtr instead of abortedRecPtr to write
> > overwrite-contrecord and remove need of extra global variable, like
> > attached?
>
> I think you mean missingContrecPtr, not abortedRecPtr. If I understand
> correctly, abortedRecPtr is going to be the location in some WAL
> segment which we replayed where a long record began, but
> missingContrecPtr seems like it would have to point to the beginning
> of the first segment we were unable to find to continue replay; and
> thus it ought to be the same as lastReplayedEndRecPtr.

So abortedRecPtr and missingContrecPtr both point to the same long
record: the former is the start of the record, and the latter is some
intermediate position where we failed to find the contrecord.
lastReplayedEndRecPtr is the end+1 of the record *prior* to the long
record.

> But the committed code doesn't seem to check that these are the same
> or verify the relationship between them in any way, so I'm worried
> there is some other case here.

Yeah, the only reason they are the same is that xlogreader sets both to
Invalid when reading a record, and then sets both when a read fails.

> The comments in XLogReadRecord also suggest this:
>
> * We get here when a record that spans multiple pages needs to be
> * assembled, but something went wrong -- perhaps a contrecord piece
> * was lost. If caller is WAL replay, it will know where the aborted
>
> Saying that "perhaps" a contrecord piece was lost seems to imply that
> other explanations are possible as well, but I'm not sure what.

Other explanations are possible. Imagine cosmic rays alter one byte in
the last contrecord. WAL replay will stop there, and the contrecord
will have been found all right, but CRC check would have failed to pass,
so we would set xlogreader->missingContrecPtr to the final contrecord of
that record (I didn't actually verify this particular scenario.)

In fact, anything that causes xlogreader.c:XLogReadRecord to return NULL
after setting "assembled=true" would set both abortedRecPtr and
missingContrecPtr -- except DecodeXLogRecord failure, which perhaps
should be handled in the same way.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Siempre hay que alimentar a los dioses, aunque la tierra esté seca" (Orual)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-10-13 17:42:46 Re: [RFC] building postgres with meson
Previous Message Fujii Masao 2021-10-13 17:36:56 Re: Improve the HINT message of the ALTER command for postgres_fdw