Re: prevent immature WAL streaming

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amul Sul <sulamul(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, "Jakub(dot)Wartak(at)tomtom(dot)com" <Jakub(dot)Wartak(at)tomtom(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Ryo Matsumura <matsumura(dot)ryo(at)fujitsu(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "masao(dot)fujii(at)oss(dot)nttdata(dot)com" <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, "mengjuan(dot)cmj(at)alibaba-inc(dot)com" <mengjuan(dot)cmj(at)alibaba-inc(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: prevent immature WAL streaming
Date: 2021-10-13 17:14:32
Message-ID: CA+TgmobsoA3nm8kKnCoh7bMi35ntQ8hv1X5C4fFxdy0U+7kQpg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 13, 2021 at 2:01 AM Amul Sul <sulamul(at)gmail(dot)com> wrote:
> Instead of abortedRecPtr point, isn't enough to write
> overwrite-contrecord at XLogCtl->lastReplayedEndRecPtr? I think both
> are pointing to the same location then can't we use
> lastReplayedEndRecPtr instead of abortedRecPtr to write
> overwrite-contrecord and remove need of extra global variable, like
> attached?

I think you mean missingContrecPtr, not abortedRecPtr. If I understand
correctly, abortedRecPtr is going to be the location in some WAL
segment which we replayed where a long record began, but
missingContrecPtr seems like it would have to point to the beginning
of the first segment we were unable to find to continue replay; and
thus it ought to be the same as lastReplayedEndRecPtr. But the
committed code doesn't seem to check that these are the same or verify
the relationship between them in any way, so I'm worried there is some
other case here. The comments in XLogReadRecord also suggest this:

* We get here when a record that spans multiple pages needs to be
* assembled, but something went wrong -- perhaps a contrecord piece
* was lost. If caller is WAL replay, it will know where the aborted

Saying that "perhaps" a contrecord piece was lost seems to imply that
other explanations are possible as well, but I'm not sure what.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2021-10-13 17:19:36 Re: [RFC] building postgres with meson
Previous Message Robert Haas 2021-10-13 17:05:58 Re: pg14 psql broke \d datname.nspname.relname