Re: global / super barriers (for checksums)

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Daniel Gustafsson <daniel(at)yesql(dot)se>
Subject: Re: global / super barriers (for checksums)
Date: 2019-07-10 13:31:11
Message-ID: CABUevEwy4LUFqePC5YzanwtzyDDpYvgrj6R5WNznwrO5ouVg1w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 30, 2018 at 6:16 AM Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> Magnus cornered me at pgconf.eu and asked me whether I could prototype
> the "barriers" I'd been talking about in the online checksumming thread.
>
> The problem there was to make sure that all processes, backends and
> auxiliary processes have seen the new state of checksums being enabled,
> and aren't currently in the process of writing a new page out.
>
> The current prototype solves that by requiring a restart, but that
> strikes me as a far too large hammer.
>
> The attached patch introduces "global barriers" (name was invented in a
> overcrowded hotel lounge, so ...), which allow to wait for such a change
> to be absorbed by all backends.
>
> I've only tested the code with gdb, but that seems to work:
>
> p WaitForGlobalBarrier(EmitGlobalBarrier(GLOBBAR_CHECKSUM))
>
> waits until all backends (including bgwriter, checkpointers, walwriters,
> bgworkers, ...) have accepted interrupts at least once. Multiple such
> requests are coalesced.
>
> I decided to wait until interrupts are actually process, rather than
> just the signal received, because that means the system is in a well
> defined state. E.g. there's no pages currently being written out.
>
> For the checksum enablement patch you'd do something like;
>
> EnableChecksumsInShmemWithLock();
> WaitForGlobalBarrier(EmitGlobalBarrier(GLOBBAR_CHECKSUM));
>
> and after that you should be able to set it to a perstistent mode.
>
>
> I chose to use procsignals to send the signals, a global uint64
> globalBarrierGen, and per-backend barrierGen, barrierFlags, with the
> latter keeping track which barriers have been requested. There likely
> seem to be other usecases.
>
>
> The patch definitely is in a prototype stage. At the very least it needs
> a high-level comment somewhere, and some of the lower-level code needs
> to be cleaned up.
>
> One thing I wasn't happy about is how checksum internals have to absorb
> barrier requests - that seems unavoidable, but I'd hope for something
> more global than just BufferSync().
>
>
> Comments?
>
>

Finally getting back to this one.

In re-reading this, I notice there are a lot of references to Intterrupt
(with two t). I'm guessing this is just a spelling error, and not something
that actually conveys some meaning?

Can you elaborate on what you mean with:
+ /* XXX: need a more principled approach here */

Is that the thing you refer to above about "checksum internals"?

Also in checking we figured it'd be nice to have a wait event for this,
since a process can potentially get stuck in an infinite loop waiting for
some other process if it's misbehaving. Kind of like the attached?

//Magnus

Attachment Content-Type Size
barrier_wait_events.patch text/x-patch 1.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2019-07-10 13:38:03 Re: Contribution to Perldoc for TestLib module in Postgres
Previous Message Alvaro Herrera 2019-07-10 13:26:28 Re: progress report for ANALYZE