Re: Weird failure with latches in curculio on v15[

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Fujii Masao <fujii(at)postgresql(dot)org>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Weird failure with latches in curculio on v15[
Date: 2023-02-20 23:17:21
Message-ID: Y/P/gU1WZn7fj7E0@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Michael Paquier (michael(at)paquier(dot)xyz) wrote:
> On Sun, Feb 19, 2023 at 08:06:24PM +0530, Robert Haas wrote:
> > I mean, my idea was to basically just have one big callback:
> > ArchiverModuleMainLoopCB(). Which wouldn't return, or perhaps, would
> > only return when archiving was totally caught up and there was nothing
> > more to do right now. And then that callback could call functions like
> > AreThereAnyMoreFilesIShouldBeArchivingAndIfYesWhatIsTheNextOne(). So
> > it would call that function and it would find out about a file and
> > start an HTTP session or whatever and then call that function again
> > and start another HTTP session for the second file and so on until it
> > had as much concurrency as it wanted. And then when it hit the
> > concurrency limit, it would wait until at least one HTTP request
> > finished. At that point it would call
> > HeyEverybodyISuccessfullyArchivedAWalFile(), after which it could
> > again ask for the next file and start a request for that one and so on
> > and so forth.
>
> This archiving implementation is not completely impossible with the
> current API infrastructure, either? If you consider the archiving as
> a two-step process where segments are first copied into a cheap,
> reliable area. Then these could be pushed in block in a more remote
> area like a S3 bucket? Of course this depends on other things like
> the cluster structure, but redundancy can be added with standby
> archiving, as well.

Surely it can't be too cheap as it needs to be reliable.. We have
looked at this before (copying to a queue area before copying with a
separate process off-system) and it simply isn't great and requires more
work than you really want to do if you can help it and for no real
benefit.

> I am not sure exactly how many requirements we want to push into a
> callback, to be honest, and surely more requirements pushed to the
> callback increases the odds of implementation mistakes, like a full
> loop. There already many ways to get it wrong with archiving, like
> missing a flush of the archived segment before the callback returns to
> ensure its durability..

Without any actual user of any of this it's surprising to me how much
effort has been put into it. Have I missed the part where someone has
said they're actually implementing an archive library that we can look
at and see how it works and how the archive library and the core system
could work better together..?

We (pgbackrest) are generally interested in the idea to reduce the
startup time, but that's not really a big issue for us currently and so
it hasn't really risen up to the level of being something we're working
on, not to mention that if it keeps changing each release then it's just
going to end up being more work for us for a feature that doesn't gain
us all that much.

Now, all that said, at least in initial discussions, we expect the
pgbackrest archive_library to look very similar to how we handle
archive_command and async archiving today- when called if there's
multiple WAL files to process then we fork an async process off and it
goes and spawns multiple processes and does its work to move the WAL
files to the off-system storage and when we are called via
archive_command we just check a status flag to see if that WAL has been
archived yet by the async process or not. If not and there's no async
process running then we'll start a new one (starting a new async process
periodically actually makes things a lot easier to test for us too,
which is why we don't just have an async process running around forever-
the startup time typically isn't that big of a deal), if there is a
status flag then we return whatever it says, and if the async process is
running and no status flag yet then we wait.

Once we have that going then perhaps there could be some interesting
iteration between pgbackrest and the core code to improve things, but
all this discussion and churn feels more likely to put folks off of
trying to implement something using this approach than the opposite,
unless someone in this discussion is actually working on an archive
library, but that isn't the impression I've gotten, at least (though if
there is such a work in progress out there, I'd love to see it!).

Thanks,

Stephen

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2023-02-20 23:23:40 Re: ICU locale validation / canonicalization
Previous Message Jim Jones 2023-02-20 23:06:05 Re: [PATCH] Add pretty-printed XML output option