From: | "Bossart, Nathan" <bossartn(at)amazon(dot)com> |
---|---|
To: | Dipesh Pandit <dipesh(dot)pandit(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Jeevan Ladhe <jeevan(dot)ladhe(at)enterprisedb(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Hannu Krosing <hannuk(at)google(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: .ready and .done files considered harmful |
Date: | 2021-09-14 16:18:34 |
Message-ID: | 40DBACB6-963B-434D-99A3-B21E607FF240@amazon.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 9/14/21, 7:23 AM, "Dipesh Pandit" <dipesh(dot)pandit(at)gmail(dot)com> wrote:
> I agree that when we are creating a .ready file we should compare
> the current .ready file with the last .ready file to check if this file is
> created out of order. We can store the state of the last .ready file
> in shared memory and compare it with the current .ready file. I
> believe that archiver specific shared memory area can be used
> to store the state of the last .ready file unless I am missing
> something and this needs to be stored in a separate shared
> memory area.
>
> With this change, we have the flexibility to move the current archiver
> state out of shared memory and keep it local to archiver. I have
> incorporated these changes and updated a new patch.
I wonder if this can be simplified even further. If we don't bother
trying to catch out-of-order .ready files in XLogArchiveNotify() and
just depend on the per-checkpoint/restartpoint directory scans, we can
probably remove lastReadySegNo from archiver state completely.
+ /* Force a directory scan if we are archiving anything but a regular
+ * WAL file or if this WAL file is being created out-of-order.
+ */
+ if (!IsXLogFileName(xlog))
+ PgArchForceDirScan();
+ else
+ {
+ TimeLineID tli;
+ XLogSegNo last_segno;
+ XLogSegNo this_segno;
+
+ last_segno = PgArchGetLastReadySegNo();
+ XLogFromFileName(xlog, &tli, &this_segno, wal_segment_size);
+
+ /*
+ * Force a directory scan in case if this .ready file created out of
+ * order.
+ */
+ last_segno++;
+ if (last_segno != this_segno)
+ PgArchForceDirScan();
+
+ PgArchSetLastReadySegNo(this_segno);
+ }
This is an interesting idea, but the "else" block here seems prone to
race conditions. I think we'd have to hold arch_lck to prevent that.
But as I mentioned above, if we are okay with depending on the
fallback directory scans, I think we can remove the "else" block
completely.
+ /* Initialize the current state of archiver */
+ xlogState.lastSegNo = MaxXLogSegNo;
+ xlogState.lastTli = MaxTimeLineID;
It looks like we have two ways to force a directory scan. We can
either set force_dir_scan to true, or lastSegNo can be set to
MaxXLogSegNo. Why not just set force_dir_scan to true here so that we
only have one way to force a directory scan?
+ /*
+ * Failed to archive, make sure that archiver performs a
+ * full directory scan in the next cycle to avoid missing
+ * the WAL file which could not be archived due to some
+ * failure in current cycle.
+ */
+ PgArchForceDirScan();
Don't we also need to force a directory scan in the other cases we
return early from pgarch_ArchiverCopyLoop()? We will have already
advanced the archiver state in pgarch_readyXlog(), so I think we'd end
up skipping files if we didn't. For example, if archive_command isn't
set, we'll just return, and the next call to pgarch_readyXlog() might
return the next file.
+ /* Continue directory scan until we find a regular WAL file */
+ SpinLockAcquire(&PgArch->arch_lck);
+ PgArch->force_dir_scan = true;
+ SpinLockRelease(&PgArch->arch_lck);
nitpick: I think we should just call PgArchForceDirScan() here.
Nathan
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2021-09-14 17:44:26 | Re: Getting ERROR "subplan "SubPlan 1" was not initialized" in EXISTS subplan when using for list partition. |
Previous Message | Alvaro Herrera | 2021-09-14 15:57:47 | Re: Remove duplicate static function check_permissions in slotfuncs.c and logicalfuncs.c |