Re: archive status ".ready" files may be created too early

From: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: "x4mmm(at)yandex-team(dot)ru" <x4mmm(at)yandex-team(dot)ru>, "a(dot)lubennikova(at)postgrespro(dot)ru" <a(dot)lubennikova(at)postgrespro(dot)ru>, "hlinnaka(at)iki(dot)fi" <hlinnaka(at)iki(dot)fi>, "matsumura(dot)ryo(at)fujitsu(dot)com" <matsumura(dot)ryo(at)fujitsu(dot)com>, "masao(dot)fujii(at)gmail(dot)com" <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: archive status ".ready" files may be created too early
Date: 2021-03-15 16:34:29
Message-ID: E63E5670-6CC3-4B09-9686-A77CF94FE4A8@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2/18/21, 4:10 PM, "Bossart, Nathan" <bossartn(at)amazon(dot)com> wrote:
> Alright, I've attached a new patch set for this.
>
> 0001 is similar to the last patch I sent in this thread, although it
> contains a few fixes. The main difference is that we no longer
> initialize lastNotifiedSeg in StartupXLOG(). Instead, we initialize
> it in XLogWrite() where we previously were creating the archive status
> files. This ensures that standby servers do not create many
> unnecessary archive status files after promotion.
>
> 0002 adds logic for persisting the last notified segment through
> crashes. This is needed because a poorly-timed crash could otherwise
> cause us to skip marking segments as ready-for-archival altogether.
> This file is only used for primary servers, as there exists a separate
> code path for marking segments as ready-for-archive for standbys.
>
> I considered attempting to prevent this bug from affecting standby
> servers by withholding WAL for a segment until the previous segment
> has been marked ready-for-archival. However, that would require us to
> track record boundaries even with archiving turned off. Also, my
> patch relied on the assumption that the flush pointer advances along
> record boundaries except for records that span multiple segments.
> This assumption is likely not always true, and even if it is, it seems
> pretty fragile. Furthermore, I suspect that there are still problems
> with standbys since the code path responsible for creating archive
> status files on standbys has even less context about the WAL record
> boundaries. IMO patches 0001 and 0002 should be the focus for now,
> and related bugs for standby servers should be picked up in a new
> thread.
>
> I ended up not touching archive_timeout at all. The documentation for
> this parameter seems to be written ambiguously enough such that any
> small differences in behavior with these patches is still acceptable.
> I don't expect that users will see much change. In the worst case,
> the timer for archive_timeout may get reset a bit before the segment's
> archive status file is created.

I've attached a set of rebased patches.

Nathan

Attachment Content-Type Size
v2-0001-Avoid-creating-archive-status-.ready-files-too-ea.patch application/octet-stream 13.7 KB
v2-0002-Keep-track-of-notified-ready-for-archive-position.patch application/octet-stream 10.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Oh, Mike 2021-03-15 16:34:56 [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Previous Message Mark Dilger 2021-03-15 16:30:43 Re: REINDEX backend filtering