Re: prevent immature WAL streaming

From: Hannu Krosing <hannuk(at)google(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "masao(dot)fujii(at)oss(dot)nttdata(dot)com" <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "mengjuan(dot)cmj(at)alibaba-inc(dot)com" <mengjuan(dot)cmj(at)alibaba-inc(dot)com>, "Jakub(dot)Wartak(at)tomtom(dot)com" <Jakub(dot)Wartak(at)tomtom(dot)com>, Ryo Matsumura <matsumura(dot)ryo(at)fujitsu(dot)com>
Subject: Re: prevent immature WAL streaming
Date: 2021-09-24 17:38:45
Message-ID: CAMT0RQRUuGavA=-W4Na302us1U8hwFa0bAV=rc5rbPjR1C35gA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Alvaro,

I just started reading this thread, but maybe you can confirm or
refute my understanding of what was done.

In the first email you write

> As mentioned in the course of thread [1], we're missing a fix for
streaming replication to avoid sending records that the primary hasn't
fully flushed yet. This patch is a first attempt at fixing that problem
by retreating the LSN reported as FlushPtr whenever a segment is
registered, based on the understanding that if no registration exists
then the LogwrtResult.Flush pointer can be taken at face value; but if a
registration exists, then we have to stream only till the start LSN of
that registered entry.

So did we end up holding back the wal_sender to not send anything that
is not confirmed as flushed on master

Are there measurements on how much this slows down replication
compared to allowing sending the moment it is written in buffers but
not necessarily flushed locally ?

Did we investigate possibility of sending as fast as possible and
controlling the flush synchronisation by sending separate flush
pointers *both* ways ?

And maybe there was even an alternative considered where we are
looking at a more general Durability, for example 2-out-of-3 where
primary is one of the 3 and not necessarily the most durable one?

-----
Hannu Krosing
Google Cloud - We have a long list of planned contributions and we are hiring.
Contact me if interested.

On Fri, Sep 24, 2021 at 4:33 AM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> On 2021-Sep-23, Alvaro Herrera wrote:
>
> > However, I notice now that the pg_rewind tests reproducibly fail in
> > branch 14 for reasons I haven't yet understood. It's strange that no
> > other branch fails, even when run quite a few times.
>
> Turns out that this is a real bug (setting EndOfLog seems insufficient).
> I'm looking into it.
>
> --
> Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/
> "No necesitamos banderas
> No reconocemos fronteras" (Jorge González)
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message tushar 2021-09-24 17:54:27 Re: extensible options syntax for replication parser?
Previous Message Robert Haas 2021-09-24 17:10:55 Re: Gather performance analysis