RE: archive status ".ready" files may be created too early

From: "matsumura(dot)ryo(at)fujitsu(dot)com" <matsumura(dot)ryo(at)fujitsu(dot)com>
To: "'Bossart, Nathan'" <bossartn(at)amazon(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: "masao(dot)fujii(at)gmail(dot)com" <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: archive status ".ready" files may be created too early
Date: 2020-06-19 10:18:34
Message-ID: OSAPR01MB5027F3C28DBC8B930E15C6A6E8980@OSAPR01MB5027.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 5/28/20, 11:42 PM, "matsumura(dot)ryo(at)fujitsu(dot)com" <matsumura(dot)ryo(at)fujitsu(dot)com>
> wrote:
> > I'm preparing a patch that backend inserting segment-crossboundary
> > WAL record leaves its EndRecPtr and someone flushing it checks
> > the EndRecPtr and notifies..

I'm sorry for my slow work.

I attach a patch.
I also attach a simple target test for primary.

1. Description in primary side

[Basic problem]
A process flushing WAL record doesn't know whether the flushed RecPtr is
EndRecPtr of cross-segment-boundary WAL record or not because only process
inserting the WAL record knows it and it never memorizes the information to anywhere.

[Basic concept of the patch in primary]
A process inserting a cross-segment-boundary WAL record memorizes its EndRecPtr
(I call it CrossBoundaryEndRecPtr) to a new structure in XLogCtl.
A flushing process creates .ready (Later, I call it just 'notify'.) against
a segment that is previous one including CrossBoundaryEndRecPtr only when its
flushed RecPtr is equal or greater than the CrossBoundaryEndRecPtr.

[Detail of implementation in primary]
* Structure of CrossBoundaryEndRecPtrs
Requirement of structure is as the following:
- System must memorize multiple CrossBoundaryEndRecPtr.
- A flushing process must determine to notify or not with only flushed RecPtr briefly.

Therefore, I implemented the structure as an array (I call it CrossBoundaryEndRecPtr array)
that is same as xlblck array. Strictly, it is enogh that the length is
'xbuffers/wal_segment_size', but I choose 'xbuffers' for simplicity that makes
enable the flushing process to use XLogRecPtrToBufIdx().
See also the definition of XLogCtl, XLOGShmemSize(), and XLOGShmemInit() in my patch.

* Action of inserting process
A inserting process memorie its CrossBoundaryEndRecPtr to CrossBoundaryEndRecPtr
array element calculated by XLogRecPtrToBufIdx with its CrossBoundaryEndRecPtr.
If the WAL record crosses many segments, only element against last segment
including the EndRecPtr is set and others are not set.
See also CopyXLogRecordToWAL() in my patch.

* Action of flushing process
Overview has been already written as the follwing.
A flushing process creates .ready (Later, I call it just 'notify'.) against
a segment that is previous one including CrossBoundaryEndRecPtr only when its
flushed RecPtr is equal or greater than the CrossBoundaryEndRecPtr.

An additional detail is as the following. The flushing process may notify
many segments if the record crosses many segments, so the process memorizes
latest notified segment number to latestArchiveNotifiedSegNo in XLogCtl.
The process notifies from latestArchiveNotifiedSegNo + 1 to
flushing segment number - 1.

And latestArchiveNotifiedSegNo is set to EndOfLog after Startup process exits
replay-loop. Standby also set same timing (= before promoting).

Mutual exlusion about latestArchiveNotifiedSegNo is not required because
the timing of accessing has been already included in WALWriteLock critical section.

2. Description in standby side

* Who notifies?
walreceiver also doesn't know whether the flushed RecPtr is EndRecPtr of
cross-segment-boundary WAL record or not. In standby, only Startup process
knows the information because it is hidden in WAL record itself and only
Startup process reads and builds WAL record.

* Action of Statup process
Therefore, I implemented that walreceiver never notify and Startup process does it.
In detail, when Startup process reads one full-length WAL record, it notifies
from a segment that includes head(ReadRecPtr) of the record to a previous segment that
includes EndRecPtr of the record.

Now, we must pay attention about switching time line.
The last segment of previous TimeLineID must be notified before switching.
This case is considered when RM_XLOG_ID is detected.

3. About other notifying for segment
Two notifyings for segment are remain. They are not needed to fix.

(1) Notifying for partial segment
It is not needed to be completed, so it's OK to notify without special consideration.

(2) Re-notifying
Currently, Checkpointer has notified through XLogArchiveCheckDone().
It is a safe-net for failure of notifying by backend or WAL writer.
Backend or WAL writer doesn't retry to notify if falis, but Checkpointer retries
to notify when it removes old segment. If it fails to notify, then it does not
remove the segment. It makes Checkpointer to retry notify until the notifying suceeeds.
Also, in this case, we can just notify whithout special consideration
because Checkpointer guarantees that all WAL record included in the segment have been already flushed.

Please, your review and comments.

Regards
Ryo Matsumura

Attachment Content-Type Size
test_in_primary.sh application/octet-stream 1.8 KB
bugfix_early_archiving_v1.0.patch application/octet-stream 8.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2020-06-19 10:25:50 Re: update substring pattern matching syntax
Previous Message Michael Paquier 2020-06-19 10:17:55 Re: doing something about the broken dynloader.h symlink