Re: prevent immature WAL streaming

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: alvherre(at)alvh(dot)no-ip(dot)org
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, bossartn(at)amazon(dot)com, mengjuan(dot)cmj(at)alibaba-inc(dot)com, Jakub(dot)Wartak(at)tomtom(dot)com
Subject: Re: prevent immature WAL streaming
Date: 2021-08-24 03:03:57
Message-ID: 20210824.120357.1673176579644397801.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Mon, 23 Aug 2021 18:52:17 -0400, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote in
> Included 蔡梦娟 and Jakub Wartak because they've expressed interest on
> this topic -- notably [2] ("Bug on update timing of walrcv->flushedUpto
> variable").
>
> As mentioned in the course of thread [1], we're missing a fix for
> streaming replication to avoid sending records that the primary hasn't
> fully flushed yet. This patch is a first attempt at fixing that problem
> by retreating the LSN reported as FlushPtr whenever a segment is
> registered, based on the understanding that if no registration exists
> then the LogwrtResult.Flush pointer can be taken at face value; but if a
> registration exists, then we have to stream only till the start LSN of
> that registered entry.
>
> This patch is probably incomplete. First, I'm not sure that logical
> replication is affected by this problem. I think it isn't, because
> logical replication will halt until the record can be read completely --
> maybe I'm wrong and there is a way for things to go wrong with logical
> replication as well. But also, I need to look at the other uses of
> GetFlushRecPtr() and see if those need to change to the new function too
> or they can remain what they are now.
>
> I'd also like to have tests. That seems moderately hard, but if we had
> WAL-molasses that could be used in walreceiver, it could be done. (It
> sounds easier to write tests with a molasses-archive_command.)
>
>
> [1] https://postgr.es/m/CBDDFA01-6E40-46BB-9F98-9340F4379505@amazon.com
> [2] https://postgr.es/m/3f9c466d-d143-472c-a961-66406172af96.mengjuan.cmj@alibaba-inc.com

(I'm not sure what "WAL-molasses" above expresses, same as "sugar"?)

For our information, this issue is related to the commit 0668719801
which makes XLogPageRead restart reading a (continued or
segments-spanning) record with switching sources. In that thread, I
modifed the code to cause a server crash under the desired situation.)

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2021-08-24 03:03:59 Re: Allow batched insert during cross-partition updates
Previous Message Kyotaro Horiguchi 2021-08-24 02:36:55 Re: .ready and .done files considered harmful