Re: archive status ".ready" files may be created too early

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: matsumura(dot)ryo(at)fujitsu(dot)com
Cc: bossartn(at)amazon(dot)com, masao(dot)fujii(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: archive status ".ready" files may be created too early
Date: 2020-07-06 05:13:40
Message-ID: 20200706.141340.695077532820840879.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, Matsumura-san.

At Mon, 6 Jul 2020 04:02:23 +0000, "matsumura(dot)ryo(at)fujitsu(dot)com" <matsumura(dot)ryo(at)fujitsu(dot)com> wrote in
> Hello, Horiguchi-san
>
> Thank you for your comment and patch.
>
> At Thursday, June 25, 2020 3:36 PM(JST), "Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>" wrote in
> > I think we don't need most of that shmem stuff. XLogWrite is called
>
> I wanted no more shmem stuff too, but other ideas need more lock
> that excludes inserter and writer each other.
>
> > after WAL buffer is filled up to the requested position. So when it
> > crosses segment boundary we know the all past corss segment-boundary
> > records are stable. That means all we need to remember is only the
> > position of the latest corss-boundary record.
>
> I could not agree. In the following case, it may not work well.
> - record-A and record-B (record-B is a newer one) is copied, and
> - lastSegContRecStart/End points to record-B's, and
> - FlushPtr is proceeded to in the middle of record-A.

IIUC, that means record-B is a cross segment-border record and we hav e
flushed beyond the recrod-B. In that case crash recovery afterwards
can read the complete record-B and will finish recovery *after* the
record-B. That's what we need here.

> In the above case, the writer should notify segments before record-A,
> but it notifies ones before record-B. If the writer notifies

If you mean that NotifyStableSegments notifies up-to the previous
segment of the segment where record-A is placed, that's wrong. The
issue here is crash recovery sees an incomplete record at a
segment-border. So it is sufficient that crash recoery can read the
last record by looking pg_wal.

> only when it flushes the latest record completely, it works well.

It confirms that "lastSegContRecEnd < LogwrtResult.Flush", that means
the last record(B) is completely flushed-out, isn't that? So it works
well.

> But the writer may not be enable to notify any segment forever when
> WAL records crossing-segment-boundary are inserted contiunuously.

No. As I mentioned in the preivous main, if we see a
cross-segment-boundary record, the previous cross-segment-boundary
record is flushed completely, and the segment containing the
first-half of the previous cross-segment-boundary record has already
been flushed. I didin't that but we can put an assertion in
XLogInsertRecord like this:

+ /* Remember the range of the record if it spans over segments */
+ XLByteToSeg(StartPos, startseg, wal_segment_size);
+ XLByteToPrevSeg(EndPos, endseg, wal_segment_size);
+
+ if (startseg != endseg)
+ {
++ /* we shouldn't have a record spanning over three or more segments */
++ Assert(endseg = startseg + 1);
+ SpinLockAcquire(&XLogCtl->info_lck);
+ if (XLogCtl->lastSegContRecEnd < StartPos)
+ {
+ XLogCtl->lastSegContRecStart = StartPos;
+ XLogCtl->lastSegContRecEnd = EndPos;

> So I think that we must remeber all such cross-segement-boundary records's EndRecPtr in buffer.
>
>
> > If we call XLogMarkEndRecPtrIfNeeded() there, the function is called
> > every time a record is written, most of which are wasteful.
> > XLogInsertRecord already has a code block executed only at every page
> > boundary.
>
> I agree.
> XLogMarkEndRecPtrIfNeeded() is moved into the code block before updating
> LogwrtRqst.Write for avoiding passing-each-other with writer.
>
>
> > Now we can identify stable portion of WAL stream. It's enough to
> > prevent walsender from sending data that can be overwritten
> > afterwards. GetReplicationTargetRecPtr() in the attached does that.
>
> I didn't notice it.
> I agree basically, but it is based on lastSegContRecStart/End.
>
> So, first of all, we have to agree what should be remebered.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-07-06 05:29:28 Re: track_planning causing performance regression
Previous Message Pavel Stehule 2020-07-06 05:05:37 Re: proposal - function string_to_table