Re: archive status ".ready" files may be created too early

From: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: "x4mmm(at)yandex-team(dot)ru" <x4mmm(at)yandex-team(dot)ru>, "a(dot)lubennikova(at)postgrespro(dot)ru" <a(dot)lubennikova(at)postgrespro(dot)ru>, "hlinnaka(at)iki(dot)fi" <hlinnaka(at)iki(dot)fi>, "matsumura(dot)ryo(at)fujitsu(dot)com" <matsumura(dot)ryo(at)fujitsu(dot)com>, "masao(dot)fujii(at)gmail(dot)com" <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: archive status ".ready" files may be created too early
Date: 2021-02-19 00:08:04
Message-ID: 593A8B65-507E-4C53-9076-5420E429F03B@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alright, I've attached a new patch set for this.

0001 is similar to the last patch I sent in this thread, although it
contains a few fixes. The main difference is that we no longer
initialize lastNotifiedSeg in StartupXLOG(). Instead, we initialize
it in XLogWrite() where we previously were creating the archive status
files. This ensures that standby servers do not create many
unnecessary archive status files after promotion.

0002 adds logic for persisting the last notified segment through
crashes. This is needed because a poorly-timed crash could otherwise
cause us to skip marking segments as ready-for-archival altogether.
This file is only used for primary servers, as there exists a separate
code path for marking segments as ready-for-archive for standbys.

I considered attempting to prevent this bug from affecting standby
servers by withholding WAL for a segment until the previous segment
has been marked ready-for-archival. However, that would require us to
track record boundaries even with archiving turned off. Also, my
patch relied on the assumption that the flush pointer advances along
record boundaries except for records that span multiple segments.
This assumption is likely not always true, and even if it is, it seems
pretty fragile. Furthermore, I suspect that there are still problems
with standbys since the code path responsible for creating archive
status files on standbys has even less context about the WAL record
boundaries. IMO patches 0001 and 0002 should be the focus for now,
and related bugs for standby servers should be picked up in a new
thread.

I ended up not touching archive_timeout at all. The documentation for
this parameter seems to be written ambiguously enough such that any
small differences in behavior with these patches is still acceptable.
I don't expect that users will see much change. In the worst case,
the timer for archive_timeout may get reset a bit before the segment's
archive status file is created.

Nathan

Attachment Content-Type Size
v1-0001-Avoid-creating-archive-status-.ready-files-too-ea.patch application/octet-stream 13.7 KB
v1-0002-Keep-track-of-notified-ready-for-archive-position.patch application/octet-stream 10.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2021-02-19 00:43:04 Re: [POC] verifying UTF-8 using SIMD instructions
Previous Message Peter Smith 2021-02-18 23:03:54 Re: [HACKERS] logical decoding of two-phase transactions