Re: global / super barriers (for checksums)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Daniel Gustafsson <daniel(at)yesql(dot)se>
Subject: Re: global / super barriers (for checksums)
Date: 2019-12-09 15:42:38
Message-ID: CA+TgmoYeH_iGH0Hb2uqi7SZK2DOz8wMXDunmftcq5YkZpwdjGA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 6, 2019 at 12:17 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > 0001 is just a code movement patch. It puts the interrupt handling for
> > each type of background process into a subroutine, instead of having
> > it all inline in the main loop. I feel this makes things more clear.
> > Hopefully it's uncontroversial.
>
> I'd suggest marking the shutdown functions as noreturn. Might prevent
> some compiler warnings in the future.

That patch only adds one shutdown function, and it's already marked as
noreturn, because it prevents a compiler warning in the present. So I
don't know what you're asking for here.

> > 0002 and 0003 are optional and slightly off-topic as far as the matter
> > at hand, but they are not entirely unrelated and I believe that they
> > are good cleanups. 0002 changes assorted background processes to use
> > PostgresSigHupHandler and ConfigReloadPending rather than having a
> > bunch of separate global variables and a bunch of separate signal
> > handlers that basically do the same stuff.
>
> Hm. Why "Postgres*"? I think that's confusing. So far only backend like
> things are named like that, and I'd e.g. not call checkpointer that.

Well, that's the existing function name, and 0003 changes it anyway.

> It's probably fine, but it's worthwhile to consider that that checking an
> external variable like ConfigReloadPending is more expensive than a
> static var like got_SIGHUP. The compiler will most of the time
> understand the code flow to understand when something like got_SIGHUP
> can change, but it can't assume anything like that with
> ConfigReloadPending. But as we're forcing the compilers hand, by marking
> the variable as volatile, so it's not likely to be a meaningful difference.

Yeah, I think it's pretty much certain to be in the noise.

> > 0003 takes this a step
> > further by trying to unify a bunch more error-handling across
> > different background process types. Together, 0002 and 0003 save >300
> > lines of code and as far as I can see there's basically no downside. I
> > think in general it would be good to strive towards a future where
> > everybody used the same signal handling and interrupt handling, but
> > these patches have the much less ambitious goal of just removing
> > overtly duplicated code. I think they might act as an inducement to
> > future patch authors to try to make things look more alike, though.
>
> I do agree that this is a good goal. The amount of redundant code we
> have across different types of processes is utterly ridiculous. While
> obviously not a significant problem, there's also

You seem to have omitted the end of this sentence.

> > 0004 is the main patch. It is substantially revised from Andres's
> > version, but the basic principle is along similar lines. Changes:
> >
> > - I felt that it was imprudent to put some of the machinery into
> > PGPROC and the rest into the ProcSignal mechanism, because there is
> > nothing that says that everything with a PGPROC must also be a
> > ProcSignal participant. So I moved everything over into the ProcSignal
> > mechanism. This seems a lot safer to me.
>
> I don't fully remember how I ended up with that split. But I think there
> is an argument that putting the generation in PGPROC is good: That way
> there are fewer changes to the shared cachelines in the ProcSignal. As I
> think there'll be more modifications to barrierGen/barrierFlags (as I'd
> named them) than the number of times they're read, that could be
> beneficial in keeping the overhead low.
>
> It's probably still worthwhile to go the way you have.

I think so. I think there's just too much chance for error/frustration
in having it in two separate mechanisms that might end up out of sync
with each other. We could pad the stuff within ProcSignal separately,
if required, but I'm hoping we're not going to be using this for
properties that change frequently.

> > - There details of the synchronization are different: there's no
> > ProcSignalKind for this new kind of barrier any more
>
> This I do not get. What's the point in making this unconditional?

I wasn't sure about this part, but Occam's razor argues for fewer
moving parts. As I have it, you need to make sure that the signal
handler is going to see our proc's generation as lower than the global
generation, and that when we go to process the interrupt (no longer in
the signal handler) we're going to be able to see the state changes
that we need to absorb. Now, if you also have a ProcSignal thing, then
you have three things to synchronize. When you see the flag set, you
need to be able to see the generations as unequal, and when you see
the generations as unequal, then you need to be able to see the state
changes that you need to absorb. That seems more complex, but it might
be worth it if the cost of comparing the barrier generations is too
much.

> > +uint64
> > +EmitProcSignalBarrier(ProcSignalBarrierType type)
> > +{
> > + uint64 flagbit = UINT64CONST(1) << (uint64) type;
> > + uint64 generation;
> > +
> > + /*
> > + * Set all the flags.
> > + *
> > + * Note that pg_atomic_fetch_or_u32 has full barrier semantics, so this
> > + * is totally ordered with respect to anything the caller did before, and
> > + * anything that we do afterwards. (This is also true of the later call
> > + * to pg_atomic_add_fetch_u64.)
> > + */
> > + for (int i = NumProcSignalSlots - 1; i >= 0; i--)
> > + {
> > + volatile ProcSignalSlot *slot = &ProcSignal->psh_slot[i];
> > +
> > + pg_atomic_fetch_or_u32(&slot->pss_barrierCheckMask, flagbit);
> > + }
>
> What's the reason for doing this in reverse? SendProcSignal() does it
> because short circuits the search once a match is found, and it
> theorizes they're at the end, but that doesn't apply here?

Mmm, this was just copy and paste. I can switch it around.

> It could potentially be worthwhile to check whether all backends already
> have the flag set, and in that case not increment the barrier
> generation? Probably not worth it for now.

I don't think it's worth it now, or maybe ever. It seems unlikely to
come up, and it makes the synchronization more complex.

> > +void
> > +WaitForProcSignalBarrier(uint64 generation)
> > +{
> > + long timeout = 125L;
> > +
> > + for (int i = NumProcSignalSlots - 1; i >= 0; i--)
> > + {
> > + volatile ProcSignalSlot *slot = &ProcSignal->psh_slot[i];
> > + uint64 oldval;
> > +
> > + oldval = pg_atomic_read_u64(&slot->pss_barrierGeneration);
> > + while (oldval < generation)
>
> Hm. without a barrier here, is it guaranteed that we can rely on the
> state we need? IOW, could it be that we see an update to
> pss_barrierGeneration to our desired generation, but still see
> "published" state by some other backend that isn't yet in accordance to
> the the generation? Seems we ought to have at least a read barrier
> somewhere?

Yeah, I think now that you mention it it should be a full barrier. The
caller might see that the barrier has been absorbed and immediately
perform either a read or a write.

> Hm. In my mental model it would be useful for barrier "processors" to
> not acknowledge the state change at certain points. Imagine e.g. needing
> to efficiently wait till all backends have processed a config file
> reload - since we don't reload while a query is being processed, we
> should be able to not ack the barrier at that point.

Yeah, you mentioned this before, but I think we could leave it for
later. I think it's going to be complex. I suppose the way to do it
would be to add an argument to ProcessProcSignalBarrier() that lets
you specify which barrier events you're OK with processing from the
current point in the code. However, that will mean that whenever
somebody adds a new barrier type, they have got to go through all of
those callers and think about whether they should add their new
barrier type into the flags or not. If we try to do the specific thing
you're talking about with config-file processing, it will also mean
that we could be waiting MUCH longer for acknowledgements, which I
think might have pretty far-reaching ramifications. You might get
stuck waiting for that barrier to be absorbed for hours, and that
would also impact later barriers, since you won't see the generation
bump to 42 until it goes through 40 and 41. That sounds fairly awful.
You could try to work around it by looking at which barrier flags are
set, but I think that has ABA problems.

It seems sufficient to me for now to come up with something that can
be used for enabling checksums online and for the thing that got me
motivated to work on this, which is ALTER SYSTEM READ ONLY. If we can
meet the requirements of those two patches with this, I think we
should be happy. And I don't think we need the feature you're talking
about here for either of those use cases. If there's some simple
change you want to propose, we can certainly talk about that, but it
looks to me like you're talking about adding a fairly complex
mechanism that wouldn't actually get used for anything right away, and
I'd rather not go there.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
v3-0004-Extend-the-ProcSignal-mechanism-to-support-barrie.patch application/octet-stream 22.6 KB
v3-0005-Not-for-commit-test-code.patch application/octet-stream 1.5 KB
v3-0002-Use-PostgresSigHupHandler-in-more-places.patch application/octet-stream 17.5 KB
v3-0003-Partially-deduplicate-interrupt-handling-for-back.patch application/octet-stream 37.9 KB
v3-0001-Move-interrupting-handling-code-into-subroutines.patch application/octet-stream 9.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-12-09 15:50:00 Re: [Proposal] Level4 Warnings show many shadow vars
Previous Message Jim Finnerty 2019-12-09 15:38:52 Re: verbose cost estimate