Re: .ready and .done files considered harmful

From: Dipesh Pandit <dipesh(dot)pandit(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Jeevan Ladhe <jeevan(dot)ladhe(at)enterprisedb(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Hannu Krosing <hannuk(at)google(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: .ready and .done files considered harmful
Date: 2021-08-02 13:06:00
Message-ID: CAN1g5_GbOgwxtOa=xaKDrXpwVBgxTUdzMWmFYseLFaFmNYUPEA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

> I think what you are saying is true before v14, but not in v14 and master.
Yes, we can use archiver specific shared memory. Thanks.

> I don't think it's great that we're using up SIGINT for this purpose.
> There aren't that many signals available at the O/S level that we can
> use for our purposes, and we generally try to multiplex them at the
> application layer, e.g. by setting a latch or a flag in shared memory,
> rather than using a separate signal. Can we do something of that sort
> here? Or maybe we don't even need a signal. ThisTimeLineID is already
> visible in shared memory, so why not just have the archiver just check
> and see whether it's changed, say via a new accessor function
> GetCurrentTimeLineID()? I guess there could be a concern about the
> expensive of that, because we'd probably be taking a spinlock or an
> lwlock for every cycle, but I don't think it's probably that bad,
> because I doubt we can archive much more than a double-digit number of
> files per second even with a very fast archive_command, and contention
> on a lock generally requires a five digit number of acquisitions per
> second. It would be worth testing to see if we can see a problem here,
> but I'm fairly hopeful that it's not an issue. If we do feel that it's
> important to avoid repeatedly taking a lock, let's see if we can find
> a way to do it without dedicating a signal to this purpose.

We can maintain the current timeline ID in archiver specific shared memory.
If we switch to a new timeline then the backend process can update the new
timeline ID in shared memory. Archiver can keep a track of current timeline
ID
and if it finds that there is a timeline switch then it can perform a full
directory
scan to make sure that archiving history files takes precedence over WAL
files.
Access to the shared memory area can be protected by adding a
WALArchiverLock.
If we take this approach then it doesn't require to use a dedicated signal
to notify
a timeline switch.

> The problem with all this is that you can't understand either function
> in isolation. Unless you read them both together and look at all of
> the ways these three variables are manipulated, you can't really
> understand the logic. And there's really no reason why that needs to
> be true. The job of cleaning timeline_switch and setting dirScan could
> be done entirely within pgarch_readyXlog(), and so could the job of
> incrementing nextLogSegNo, because we're not going to again call
> pgarch_readyXlog() unless archiving succeeded.

> Also note that the TLI which is stored in curFileTLI corresponds to
> the segment number stored in nextLogSegNo, yet one of them has "cur"
> for "current" in the name and the other has "next". It would be easier
> to read the code if the names were chosen more consistently.

> My tentative idea as to how to clean this up is: declare a new struct
> with a name like readyXlogState and members lastTLI and lastSegNo.
> Have pgarch_ArchiverCopyLoop() declare a variable of this type, zero
> it, pass it as a parameter to pgarch_readyXlog(), and otherwise leave
> it alone. Then let pgarch_readyXlog() do all of the manipulation of
> the values stored therein.

Make sense, we can move the entire logic to a single function
pgarch_readyXlog()
and declare a new struct readyXLogState.

I think we cannot declare a variable of this type in
pgarch_ArchiverCopyLoop()
due to the fact that this function will be called every time the archiver
wakes up.
Initializing readyXLogState here will reset the next anticipated log
segment number
when the archiver wakes up from a wait state. We can declare and initialize
it in
pgarch_MainLoop() to avoid resetting the next anticipated log segment
number
when the archiver wakes up.

> You've moved this comment from its original location, but the trouble
> is that the comment is 100% false. In fact, the whole reason why you
> wrote this patch is *because* this comment is 100% false. In fact it
> is not difficult to create cases where each scan finds many files, and
> the purpose of the patch is precisely to optimize the code that the
> person who wrote this thought didn't need optimizing. Now it may take
> some work to figure out what we want to say here exactly, but
> preserving the comment as it's written here is certainly misleading.

Yes, I agree. We can update the comments here to list the scenarios
where we may need to perform a full directory scan.

I have incorporated these changes and updated a new patch. Please find
the attached patch v5.

Thanks,
Dipesh

Attachment Content-Type Size
v5-0001-mitigate-directory-scan-for-WAL-archiver.patch text/x-patch 12.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2021-08-02 13:07:18 Re: Background writer and checkpointer in crash recovery
Previous Message Илья Гладышев 2021-08-02 12:29:57 Update of partition key on foreign server