Re: global / super barriers (for checksums)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Daniel Gustafsson <daniel(at)yesql(dot)se>
Subject: Re: global / super barriers (for checksums)
Date: 2019-11-13 17:26:34
Message-ID: CA+TgmobRhjD+NPxjcJSjwpGOxt_w6cObsYS4EtZntUc7S1Oe3w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 30, 2018 at 1:17 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> The patch definitely is in a prototype stage. At the very least it needs
> a high-level comment somewhere, and some of the lower-level code needs
> to be cleaned up.
>
> One thing I wasn't happy about is how checksum internals have to absorb
> barrier requests - that seems unavoidable, but I'd hope for something
> more global than just BufferSync().

Hi,

TL;DR: I'm not sure that we need 0001; I propose to commit 0002; and I
have some concerns about 0003 and am interested in working further on
it.

0001 changes the StartBackgroundWorker so that the SIGINT handler is
contingent on BGWORKER_SHMEM_ACCESS rather than
BGWORKER_BACKEND_DATABASE_CONNECTION. It seems to me that the goal
here should be to use procsignal_sigusr1_handler(), or something that
calls it, in any process where ProcSignalInit() is called, but a
backend that only requests BGWORKER_SHMEM_ACCESS probably won't,
because the normal way for a background worker to call
ProcSignalInit() would be to indirectly call InitPostgres() by means
of BackgroundWorkerInitializeConnection() or
BackgroundWorkerInitializeConnectionByOid(), and a worker that didn't
ask for a database connection presumably shouldn't be doing that. So
I'm not sure that I understand why we need this. (Is it legal for a
worker to call one of these functions as long as it passes InvalidOid
for the database OID, or something? Do we have examples of workers
that do that?)

On the other hand, 0002 seems like it's pretty clearly a good idea. It
makes a whole bunch of auxiliary processes use
procsignal_sigusr1_handler() and those things all get called from
AuxiliaryProcessMain(), which does ProcSignalInit(), and it seems like
clearly the right idea that processes which register to participate in
the procsignal mechanism should also register to get notified if they
receive a procsignal. I think that the reason we haven't bothered with
this up until now is because I think that it's presently impossible
for any of the kind of procsignals that we have to get sent to any of
those processes. But, global barriers would require us to do so, so it
seems like it's time to tighten that up, and it doesn't really cost
anything. So I propose to commit this part soon, unless somebody
objects.

Regarding 0003:

- The proc->pid == 0 check in EmitGlobalBarrier() doesn't really do
what it appears to do, because regular backends don't clear proc->pid
when they exit; only auxiliary processes do. (We could, and perhaps
should, change that.)

- It seems to me that it would be generally better to insert
CHECK_FOR_INTERRUPTS() in places where the patch just inserts an
ad-hoc if (GlobalBarrierInterruptPending)
ProcessGlobalBarrierIntterupt(). There might be someplace where we
have to do it the latter way because we unavoidably hold an LWLock or
something, but I think we should avoid that whenever possible. If it's
a good place to check for one kind of interrupt, it's probably a good
place to check for all of them.

- I don't think BufferSync() is doing this in the right place. Unless
I am confused, it's doing it inside a loop that just allocates stuff
and copies data around, which probably does not need this kind of
decoration. It wouldn't hurt to check here, and it might be a good
idea for safety, but I think the place that really needs this
treatment is the following loop where we are actually doing I/O and
sleeping. Possibly we should even be doctoring CheckpointWriteDelay to
use a latch-wait loop that can CHECK_FOR_INTERRUPTS() before
re-sleeping, although for 100ms (curiously described as a Big Sleep)
it might be overkill.

- I think it would be nice to make this system more extensible. IIUC,
the idea is that ProcessGlobalBarrierIntterupt() will call bespoke
code for each GLOBBAR_* flag that set, and that code will then update
process-local state from global state before returning. But, instead
of hard-coding constants, how about just having a list of callbacks,
and we call them all here, and each one is responsible for figuring
out whether anything's been changed that it cares about, and if so
updating the appropriate local state? Then GlobalBarrierKind goes away
altogether. Granted, this could be a bit expensive if we were using
global barriers for lots of different things and at least some of
those things occurred frequently, but I don't think we're very near
the point where we need that kind of optimization. As long as the
callbacks have tests that exit quickly if there's nothing to do, I
think it should be fine. (As an intermediate position, we could
consider keeping barrierFlags but assign the bits dynamically; but
unless and until we get more users of the facility, I think it might
be simplest to just forget about barrierFlags altogether. Then even
extensions can use this, if they want.)

- The patch needs some general tidying up, like comments and naming
consistency and stuff like that.

Andres, Magnus, if neither of you are planning to work on this soon, I
might like to jump in and run with this. Please let me know your
thoughts.

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ranier Vilela 2019-11-13 17:40:27 RE: [PATCH] Improve AtSubCommit_childXids
Previous Message Andres Freund 2019-11-13 17:10:39 Re: [PATCH] Improve AtSubCommit_childXids