Re: archive status ".ready" files may be created too early

From: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, "alvherre(at)alvh(dot)no-ip(dot)org" <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "x4mmm(at)yandex-team(dot)ru" <x4mmm(at)yandex-team(dot)ru>, "a(dot)lubennikova(at)postgrespro(dot)ru" <a(dot)lubennikova(at)postgrespro(dot)ru>, "hlinnaka(at)iki(dot)fi" <hlinnaka(at)iki(dot)fi>, "matsumura(dot)ryo(at)fujitsu(dot)com" <matsumura(dot)ryo(at)fujitsu(dot)com>, "masao(dot)fujii(at)gmail(dot)com" <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: archive status ".ready" files may be created too early
Date: 2021-08-30 22:39:04
Message-ID: C6234E05-6391-485F-92C3-10F7EE4D9612@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8/30/21, 2:06 PM, "Andres Freund" <andres(at)anarazel(dot)de> wrote:
> When were you thinking of doing the latch sets? Adding a latch set for every
> XLogWrite() wouldn't be cheap either. Both because syscalls under a lock
> aren't free and because waking up walsender even more often isn't free (we
> already have a few threads about reducing the signalling frequency).
>
> There's also the question of what to do with single user mode. We shouldn't
> just skip creating .ready files there...

Good point.

> Although, the more I think about, the more I am confused about the trailing
> if (XLogArchivingActive())
> NotifySegmentsReadyForArchive(LogwrtResult.Flush);
>
> in XLogWrite(). Shouldn't that at the very least be inside the "If asked to
> flush, do so" branch? Outside that and the finishing_seg branch
> LogwrtResult.Flush won't have moved, right? So the call to
> NotifySegmentsReadyForArchive() can't do anything, no?

The registration logic looks like this:
1. Register boundary
2. Get flush location from shared memory
3. If flush location >= our just-registered boundary, nudge
the WAL writer to create .ready files if needed

If we called NotifySegmentsReadyForArchive() before we updated the
flush location in shared memory, we might skip nudging the WAL writer
even though we should.

> Nor does it seem like we'd ever need to call NotifySegmentsReadyForArchive()
> if we started writing on the current page - flushRecPtr can't move across a
> segment boundary in that case.

I think there is a chance that we've crossed one of our recorded
segment boundaries anytime the flush pointer moves.

> When s1 is fully written out, NotifySegmentsReadyForArchive() will set
> earliestSegBoundary = latestSegBoundary = s4 and create .reaady for .s1 -
> ok. But when when s2 is flushed, we'll afaict happily create .ready files
> for s2, s3 despite s4 not yet being written, because earliestSegBoundary is
> now s4.

In this case, the .ready files for s2 and s3 wouldn't be created until
s4 is flushed to disk.

> The whole approach here of delaying .ready creation for these types of
> segments seems wrong to me. Doesn't the exact same problem also exist for
> streaming rep - which one can also use to maintain a PITR archive? walsender
> sends up to the flush location, and pg_receivewal's FindStreamingStart() will
> afaict just continue receiving from after that point.

The problem with streaming replication is being discussed in a new
thread [0].

Nathan

[0] https://postgr.es/m/202108232252.dh7uxf6oxwcy%40alvherre.pgsql

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bossart, Nathan 2021-08-30 23:08:42 Re: archive status ".ready" files may be created too early
Previous Message Zhihong Yu 2021-08-30 22:14:28 Re: Use extended statistics to estimate (Var op Var) clauses