From: | "Bossart, Nathan" <bossartn(at)amazon(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Dipesh Pandit <dipesh(dot)pandit(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Jeevan Ladhe <jeevan(dot)ladhe(at)enterprisedb(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Hannu Krosing <hannuk(at)google(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: .ready and .done files considered harmful |
Date: | 2021-08-18 20:35:12 |
Message-ID: | 27F8BC15-6242-44C8-9818-5D02A3BF8F5C@amazon.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thanks for the new version of the patch. Overall, I think it is on
the right track.
+ /*
+ * This .ready file is created out of order, notify archiver to perform
+ * a full directory scan to archive corresponding WAL file.
+ */
+ StatusFilePath(archiveStatusPath, xlog, ".ready");
+ if (stat(archiveStatusPath, &stat_buf) == 0)
+ {
+ PgArchEnableDirScan();
+ PgArchWakeup();
+ }
Should we have XLogArchiveNotify(), writeTimeLineHistory(), and
writeTimeLineHistoryFile() enable the directory scan instead? Else,
we have to exhaustively cover all such code paths, which may be
difficult to maintain. Another reason I am bringing this up is that
my patch for adjusting .ready file creation [0] introduces more
opportunities for .ready files to be created out-of-order.
+ /*
+ * This is a fall-back path, check if we are here due to the unavailability
+ * of next anticipated log segment or the archiver is being forced to
+ * perform a full directory scan. Reset the flag in shared memory only if
+ * it has been enabled to force a full directory scan and then proceed with
+ * directory scan.
+ */
+ if (PgArch->dirScan)
+ PgArch->dirScan = false;
Why do we need to check that the flag is set before we reset it? I
think we could just always reset it since we are about to do a
directory scan anyway.
On 8/18/21, 7:25 AM, "Robert Haas" <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Aug 17, 2021 at 4:19 PM Bossart, Nathan <bossartn(at)amazon(dot)com> wrote:
>> Thinking further, I think the most important thing to ensure is that
>> resetting the flag happens before we begin the directory scan.
>> Consider the following scenario in which a timeline history file would
>> potentially be lost:
>>
>> 1. Archiver completes directory scan.
>> 2. A timeline history file is created and the flag is set.
>> 3. Archiver resets the flag.
>
> Dipesh says in his latest email that the archiver resets the flag just
> before it begins a directory scan. If that's accurate, then I think
> this sequence of events can't occur.
>
> If there is a race condition here with setting the flag, then an
> alternative design would be to use a counter - either a plain old
> uint64 or perhaps pg_atomic_uint64 - and have the startup process
> increment the counter when it wants to trigger a scan. In this design,
> the archiver would never modify the counter itself, but just remember
> the last value that it saw. If it later sees a different value it
> knows that a full scan is required. I think this kind of system is
> extremely robust against the general class of problems that you're
> talking about here, but I'm not sure whether we need it, because I'm
> not sure whether there is a race with just the bool.
I'm not sure, either. Perhaps it would at least be worth adding a
pg_memory_barrier() after setting dirScan to false to avoid the
scenario I mentioned (which may or may not be possible). IMO this
stuff would be much easier to reason about if we used a lock instead,
even if the synchronization was not strictly necessary. However, I
don't want to hold this patch up too much on this point.
Nathan
[0] https://postgr.es/m/05AD5FE2-9A53-4D11-A3F8-3A83EBB0EB93%40amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2021-08-18 21:01:40 | support for windows robocopy in archive_command and restore_command |
Previous Message | Peter Geoghegan | 2021-08-18 19:58:27 | Re: The Free Space Map: Problems and Opportunities |