Re: archive status ".ready" files may be created too early

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: bossartn(at)amazon(dot)com
Cc: alvherre(at)alvh(dot)no-ip(dot)org, x4mmm(at)yandex-team(dot)ru, a(dot)lubennikova(at)postgrespro(dot)ru, hlinnaka(at)iki(dot)fi, matsumura(dot)ryo(at)fujitsu(dot)com, masao(dot)fujii(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: archive status ".ready" files may be created too early
Date: 2021-08-05 01:57:57
Message-ID: 20210805.105757.2185619490399623846.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Tue, 3 Aug 2021 21:32:18 +0000, "Bossart, Nathan" <bossartn(at)amazon(dot)com> wrote in
> On 8/2/21, 7:37 PM, "Kyotaro Horiguchi" <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > I'm afraid that using hash to store boundary info is too much. Isn't a
> > ring buffer enough for this use? In that case it is enough to
> > remember only the end LSN of the segment spanning records. It is easy
> > to expand the buffer if needed.
>
> I agree that the hash table requires a bit more memory than what is
> probably necessary, but I'm not sure I agree that maintaining a custom
> data structure to save a few kilobytes of memory is worth the effort.

Memory is one of my concerns but more significant point was required
CPU cycles by GetLatestRecordBoundarySegment. So I don't mind it is
using a hash if the loop on the hash didn't block other backends.

Addition to that, while NotifySegmentsReadyForArchive() is notifying
pending segments, other backends simultaneously reach there are
blocked until the notification, incuding file creation, finishes. I
don't think that's great. Couldn't we set lastNotifiedSegment before
the loop? At the moment a backend decides to notify some segments,
others no longer need to consider those segments. Even if the backend
crashes meanwhile, as you mentionied below, it's safe since the
unnotified segments are notifed after restart.

> > @@ -2602,7 +2855,7 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
> > LogwrtResult.Flush = LogwrtResult.Write; /* end of page */
> >
> > if (XLogArchivingActive())
> > - XLogArchiveNotifySeg(openLogSegNo);
> > + SetLastNotifiedSegmentIfInvalid(openLogSegNo - 1);
> >
> > Is it safe? If server didn't notified of WAL files for recent 3
> > finished segments in the previous server life, they need to be
> > archived this life time. But this omits maybe all of the tree.
> > (I didn't confirm that behavior..)
>
> I tested this scenario out earlier [0]. It looks like the call to
> XLogArchiveCheckDone() in RemoveOldXlogFiles() will take care of
> creating any .ready files we missed.

Yeah, I reclled of that behvaior. In that case crash recovery reads up
to just before the last (continued) record in the last finished
segment. On the other hand if creash recovery was able to read that
record, it's safe to archive the last segment immediately after
recovery. So that behavior is safe. Thanks!

> >> I believe my worry was that we'd miss notifying a segment as soon as
> >> possible if the record was somehow flushed prior to registering the
> >> record boundary in the map. If that's actually impossible, then I
> >> would agree that the extra call to NotifySegmentsReadyForArchive() is
> >> unnecessary.
> >
> > I don't think that XLogWrite(up to LSN=X) can happen before
> > XLogInsert(endpos = X) ends.
>
> Is there anything preventing that from happening? At the location
> where we are registering the record boundary, we've already called
> CopyXLogRecordToWAL(), and neither the WAL insertion lock nor the
> WALWriteLock are held. Even if we register the boundary before
> updating the shared LogwrtRqst.Write, there's a chance that someone
> else has already moved it ahead and called XLogWrite(). I think the
> worst case scenario is that we hold off creating .ready files longer
> than necessary, but IMO that's still a worthwhile thing to do.

Oh, boundary registration happens actually after an insertion ends
(but before XLogInsert ends). The missed segment is never processed
due to the qualification by lastNotifiedSeg.

Does it work that RegisterRecordBoundaryEntry omits registering of the
bounary if it finds lastNotifiedSeg have gone too far?

> Nathan
>
> [0] https://postgr.es/m/DA71434B-7340-4984-9B91-F085BC47A778%40amazon.com
>

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Nancarrow 2021-08-05 02:03:04 Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
Previous Message Tomas Vondra 2021-08-04 23:21:18 Re: A varint implementation for PG?