Re: archive status ".ready" files may be created too early

From: Andres Freund <andres(at)anarazel(dot)de>
To: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
Cc: "alvherre(at)alvh(dot)no-ip(dot)org" <alvherre(at)alvh(dot)no-ip(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "x4mmm(at)yandex-team(dot)ru" <x4mmm(at)yandex-team(dot)ru>, "a(dot)lubennikova(at)postgrespro(dot)ru" <a(dot)lubennikova(at)postgrespro(dot)ru>, "hlinnaka(at)iki(dot)fi" <hlinnaka(at)iki(dot)fi>, "matsumura(dot)ryo(at)fujitsu(dot)com" <matsumura(dot)ryo(at)fujitsu(dot)com>, "masao(dot)fujii(at)gmail(dot)com" <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: archive status ".ready" files may be created too early
Date: 2021-08-31 02:38:39
Message-ID: 20210831023839.7brycllwyyapbrfv@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-08-30 22:39:04 +0000, Bossart, Nathan wrote:
> On 8/30/21, 2:06 PM, "Andres Freund" <andres(at)anarazel(dot)de> wrote:
> > When were you thinking of doing the latch sets? Adding a latch set for every
> > XLogWrite() wouldn't be cheap either. Both because syscalls under a lock
> > aren't free and because waking up walsender even more often isn't free (we
> > already have a few threads about reducing the signalling frequency).
> >
> > There's also the question of what to do with single user mode. We shouldn't
> > just skip creating .ready files there...
>
> Good point.
>
> > Although, the more I think about, the more I am confused about the trailing
> > if (XLogArchivingActive())
> > NotifySegmentsReadyForArchive(LogwrtResult.Flush);
> >
> > in XLogWrite(). Shouldn't that at the very least be inside the "If asked to
> > flush, do so" branch? Outside that and the finishing_seg branch
> > LogwrtResult.Flush won't have moved, right? So the call to
> > NotifySegmentsReadyForArchive() can't do anything, no?
>
> The registration logic looks like this:
> 1. Register boundary
> 2. Get flush location from shared memory
> 3. If flush location >= our just-registered boundary, nudge
> the WAL writer to create .ready files if needed
>
> If we called NotifySegmentsReadyForArchive() before we updated the
> flush location in shared memory, we might skip nudging the WAL writer
> even though we should.

That's trivial to address - just have a local variable saying whether we need
to call NotifySegmentsReadyForArchive().

Note that the finishing_seg path currently calls
NotifySegmentsReadyForArchive() before the shared memory flush location is
updated.

> > When s1 is fully written out, NotifySegmentsReadyForArchive() will set
> > earliestSegBoundary = latestSegBoundary = s4 and create .reaady for .s1 -
> > ok. But when when s2 is flushed, we'll afaict happily create .ready files
> > for s2, s3 despite s4 not yet being written, because earliestSegBoundary is
> > now s4.
>
> In this case, the .ready files for s2 and s3 wouldn't be created until
> s4 is flushed to disk.

I don't think that's true as the code stands today? The
NotifySegmentsReadyForArchive() for s2 will update earliestSegBoundary to s4,
because latestSegBoundary = 4 and earliestSegBoundary = 1, triggering the
keep_latest branch. Any subsequent NotifySegmentsReadyForArchive() with a
segment < 4 will then be able to flush s2 and s3?

> > The whole approach here of delaying .ready creation for these types of
> > segments seems wrong to me. Doesn't the exact same problem also exist for
> > streaming rep - which one can also use to maintain a PITR archive? walsender
> > sends up to the flush location, and pg_receivewal's FindStreamingStart() will
> > afaict just continue receiving from after that point.
>
> The problem with streaming replication is being discussed in a new
> thread [0].

I don't think it's sensible to fix these separately. It'd be one thing to do
that for HEAD, but on the back branches? And that this patch hasn't gotten any
performance testing is scary.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Laurenz Albe 2021-08-31 02:55:35 Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead
Previous Message Masahiko Sawada 2021-08-31 02:37:08 Replication slot drop message is sent after pgstats shutdown.