Re: Weird failure with latches in curculio on v15

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Fujii Masao <fujii(at)postgresql(dot)org>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Weird failure with latches in curculio on v15
Date: 2023-02-09 16:12:21
Message-ID: CA+TgmoY6xxDdZ1YJjMb27Ei7qgFcbu=__WtDX_cOd3GQ5uvxWw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 9, 2023 at 10:51 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I'm fairly concerned about the idea of making it common for people
> to write their own main loop for the archiver. That means that, if
> we have a bug fix that requires the archiver to do X, we will not
> just be patching our own code but trying to get an indeterminate
> set of third parties to add the fix to their code.

I don't know what kind of bug we could really have in the main loop
that would be common to every implementation. They're probably all
going to check for interrupts, do some work, and then wait for I/O on
some things by calling select() or some equivalent. But the work, and
the wait for the I/O, would be different for every implementation. I
would anticipate that the amount of common code would be nearly zero.

Imagine two archive modules, one of which archives files via HTTP and
the other of which archives them via SSH. They need to do a lot of the
same things, but the code is going to be totally different. When the
HTTP archiver module needs to open a new connection, it's going to
call some libcurl function. When the SSH archiver module needs to do
the same thing, it's going to call some libssh function. It seems
quite likely that the HTTP implementation would want to juggle
multiple connections in parallel, but the SSH implementation might not
want to do that, or its logic for determining how many connections to
open might be completely different based on the behavior of that
protocol vs. the other protocol. Once either implementation has sent
as much data it can over the connections it has open, it needs to wait
for those sockets to become write-ready or, possibly, read-ready.
There again, each one will be calling into a different library to do
that. It could be that in this particular case, but would be waiting
for a set of file descriptors, and we could provide some framework for
waiting on a set of file descriptors provided by the module. But you
could also have some other archiver implementation that is, say,
waiting for a process to terminate rather than for a file descriptor
to become ready for I/O.

> If we think we need primitives to let the archiver hooks get all
> the pending files, or whatever, by all means add those. But don't
> cede fundamental control of the archiver. The hooks need to be
> decoration on a framework we provide, not the framework themselves.

I don't quite see how you can make asynchronous and parallel archiving
work if the archiver process only calls into the archive module at
times that it chooses. That would mean that the module has to return
control to the archiver when it's in the middle of archiving one or
more files -- and then I don't see how it can get control back at the
appropriate time. Do you have a thought about that?

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-02-09 16:28:44 Re: Inconsistent nullingrels due to oversight in deconstruct_distribute_oj_quals
Previous Message Tom Lane 2023-02-09 15:55:56 Re: Inconsistent nullingrels due to oversight in deconstruct_distribute_oj_quals