Re: prevent immature WAL streaming

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, 蔡梦娟(玊于) <mengjuan(dot)cmj(at)alibaba-inc(dot)com>, Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com>
Subject: Re: prevent immature WAL streaming
Date: 2021-08-31 13:56:30
Message-ID: 202108311356.sl33wcpcz5x6@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021-Aug-30, Andres Freund wrote:

> I'm doubtful that the approach of adding awareness of record boundaries
> is a good path to go down:

Honestly, I do not like it one bit and if I can avoid relying on them
while making the whole thing work correctly, I am happy. Clearly it
wasn't a problem for the ancient recovery-only WAL design, but as soon
as we added replication on top the whole issue of continuation records
became a bug.

I do think that the code should be first correct and second performant,
though.

> - There are very similar issues with promotions of replicas (consider
> what happens if we need to promote with the end of local WAL spanning
> a segment boundary, and what happens to cascading replicas). We have
> some logic to try to deal with that, but it's pretty grotty and I
> think incomplete.

Ouch, I hadn't thought of cascading replicas.

> - It seems to make some future optimizations harder - we should work
> towards replicating data sooner, rather than the opposite. Right now
> that's a major bottleneck around syncrep.

Absolutely.

> I think a better approach might be to handle this on the WAL layout
> level. What if we never overwrite partial records but instead just
> skipped over them during decoding?

Maybe this is a workable approach, let's work it out fully.

Let me see if I understand what you mean:
* We would remove the logic to inhibit archiving and streaming-
replicating the tail end of a split WAL record; that logic deals with
bytes only, so doesn't have to be aware of record boundaries.
* On WAL replay, we ignore records that are split across a segment
boundary and whose checksum does not match.
* On WAL write ... ?

How do we detect after recovery that a record that was being written,
and potentially was sent to the archive, needs to be "skipped"?

--
Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2021-08-31 14:05:44 Re: Added schema level support for publication.
Previous Message Andrew Dunstan 2021-08-31 13:16:07 Re: pgsql: Avoid using ambiguous word "positive" in error message.