Re: global / super barriers (for checksums)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Daniel Gustafsson <daniel(at)yesql(dot)se>
Subject: Re: global / super barriers (for checksums)
Date: 2019-12-02 18:06:24
Message-ID: CA+TgmoZwDk=BguVDVa+qdA6SBKef=PKbaKDQALTC_9qoz1mJqg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Nov 17, 2019 at 8:38 AM Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> I'm definitely happy to work with it, but I did not and do not feel I have the skills for doing the "proper review" needed for it. So I am also very happy for you to pick it up and run with it.

OK, here's what I came up with.

0001 is just a code movement patch. It puts the interrupt handling for
each type of background process into a subroutine, instead of having
it all inline in the main loop. I feel this makes things more clear.
Hopefully it's uncontroversial.

0002 and 0003 are optional and slightly off-topic as far as the matter
at hand, but they are not entirely unrelated and I believe that they
are good cleanups. 0002 changes assorted background processes to use
PostgresSigHupHandler and ConfigReloadPending rather than having a
bunch of separate global variables and a bunch of separate signal
handlers that basically do the same stuff. 0003 takes this a step
further by trying to unify a bunch more error-handling across
different background process types. Together, 0002 and 0003 save >300
lines of code and as far as I can see there's basically no downside. I
think in general it would be good to strive towards a future where
everybody used the same signal handling and interrupt handling, but
these patches have the much less ambitious goal of just removing
overtly duplicated code. I think they might act as an inducement to
future patch authors to try to make things look more alike, though.

0004 is the main patch. It is substantially revised from Andres's
version, but the basic principle is along similar lines. Changes:

- I felt that it was imprudent to put some of the machinery into
PGPROC and the rest into the ProcSignal mechanism, because there is
nothing that says that everything with a PGPROC must also be a
ProcSignal participant. So I moved everything over into the ProcSignal
mechanism. This seems a lot safer to me.

- I accordingly renamed this thing from GlobalBarrier to
ProcSignalBarrier. This is a lot more arguable, but as of now I favor
it.

- There details of the synchronization are different: there's no
ProcSignalKind for this new kind of barrier any more, and there's now
only one pg_memory_barrier() and it's in a different place than Andres
had it. It is quite possible that I have screwed things up here, but I
couldn't make heads or tails of the way Andres had it. I don't know
whether that's because it was a POC or because I'm dumb. Also, moving
everything over into ProcSignal seemed to me to suggest some
rejiggering here that might've made less sense in the old scheme.

- I changed the code that waits for a barrier to use a latch wait,
because I think that's our usual convention. So wait-for-barrier gets
woken up early if the latch is set, and also itself manipulates the
latch. I also gave it a variable-length timeout, because when testing
it irked me that the same test randomly took either 0s or 1s. Now it
takes either 0s or 125ms, which is a lot less noticeable.

- I changed the kind of barrier provided by the patch from "checksum"
to "sample".

- Otherwise hacked on comments and naming somewhat.

This patch probably needs some more changes before commit. One likely
candidate: the use of a sample barrier type here probably needs to be
replaced with something else. But the way this is set up doesn't
really lend itself to starting with 0 types of barrier and then adding
them later, so I am not sure exactly what to do to make this patch
independent of what follows. The callback proposal I made before
could've accommodated that (as well as, perhaps, some automated
testing by dynamically loading up a barrier type) but Andres didn't
like that. Perhaps he'll relent, or have a counter-proposal, or
someone else will feel differently.

0005 is test code that sends and waits for a sample barrier.

Comments?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
v2-0005-Not-for-commit-test-code.patch application/octet-stream 1.5 KB
v2-0001-Move-interrupting-handling-code-into-subroutines.patch application/octet-stream 9.8 KB
v2-0002-Use-PostgresSigHupHandler-in-more-places.patch application/octet-stream 17.5 KB
v2-0003-Partially-deduplicate-interrupt-handling-for-back.patch application/octet-stream 37.9 KB
v2-0004-Extend-the-ProcSignal-mechanism-to-support-barrie.patch application/octet-stream 22.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-12-02 19:22:09 Re: surprisingly expensive join planning query
Previous Message Mark Dilger 2019-12-02 17:55:59 Re: Should we add xid_current() or a int8->xid cast?