Re: archive status ".ready" files may be created too early

From: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: "alvherre(at)alvh(dot)no-ip(dot)org" <alvherre(at)alvh(dot)no-ip(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "x4mmm(at)yandex-team(dot)ru" <x4mmm(at)yandex-team(dot)ru>, "a(dot)lubennikova(at)postgrespro(dot)ru" <a(dot)lubennikova(at)postgrespro(dot)ru>, "hlinnaka(at)iki(dot)fi" <hlinnaka(at)iki(dot)fi>, "matsumura(dot)ryo(at)fujitsu(dot)com" <matsumura(dot)ryo(at)fujitsu(dot)com>, "masao(dot)fujii(at)gmail(dot)com" <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: archive status ".ready" files may be created too early
Date: 2021-08-31 17:01:31
Message-ID: 21D63383-9336-4B64-BF12-BFE33563D7A2@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8/31/21, 12:44 AM, "Andres Freund" <andres(at)anarazel(dot)de> wrote:
> If there's now a flush request including all of s3, we'll have the following
> sequence of notifies:
>
> NotifySegmentsReadyForArchive(s1)
> nothing happens, smaller than s1+10
>
> NotifySegmentsReadyForArchive(s2)
> earliestSegBoundary = s4
> earliestSegBoundaryEndPtr = s4+10
> latestSegBoundary = s4
> latestSegBoundaryEndPtr = s4 + 10
> latest_boundary_seg = s1
>
> NotifySegmentsReadyForArchive(s3)
> nothing happens, flush is smaller than s4

When earliestSegBoundary is set to s4, latestSegBoundary will be set
to MaxXLogSegNo.

> If the record ending at s4 + 10 isn't an async commit (and thus
> XLogCtl->asyncXactLSN is smaller), and there are no further records, we can
> end up waiting effectively forever for s2 (and s3) to be archived. If all
> other connections (and autovac etc) are idle, there will be no XLogFlush()
> calls, nor will XLogBackgroundFlush() do anything, because it'll hit the "If
> already known flushed" path, because the the first page in s4 is only
> partially filled.

I'm not following why s4 wouldn't be flushed in this example. Even if
the first page in s4 is only partially filled, that portion of the
record should still get flushed, and we'll create the .ready files for
s2 and s3 at that time. I tested this by adding some debug logging
and creating a small record that crossed segment boundaries but didn't
fill the first page on the next segment, and the .ready file was
created as expected. Is there a case where we wouldn't flush the end
of the record to disk?

During my testing, I did find an obvious bug. We probably shouldn't
be calling NotifySegmentsReadyForArchive() when archiving isn't
enabled.

diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index 6a1e16edc2..8ca0d8e616 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -253,7 +253,8 @@ WalWriterMain(void)
* here to handle a race condition where WAL is flushed to disk prior
* to registering the segment boundary.
*/
- NotifySegmentsReadyForArchive(GetFlushRecPtr());
+ if (XLogArchivingActive())
+ NotifySegmentsReadyForArchive(GetFlushRecPtr());

/*
* Do what we're here for; then, if XLogBackgroundFlush() found useful

Nathan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gurjeet Singh 2021-08-31 17:02:03 Re: Returning to Postgres community work
Previous Message Peter Eisentraut 2021-08-31 16:52:52 Re: [PATCH] pg_permissions