Re: [Patch] ALTER SYSTEM READ ONLY

From: Amul Sul <sulamul(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Patch] ALTER SYSTEM READ ONLY
Date: 2020-10-28 11:43:38
Message-ID: CAAJ_b97pC5MRwG42c9StkZSNYXh9Lq8YittY8NZLcgUcB1LfjA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 8, 2020 at 3:52 PM Amul Sul <sulamul(at)gmail(dot)com> wrote:
>
> On Wed, Oct 7, 2020 at 11:19 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >
> > On Wed, Sep 16, 2020 at 3:33 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > > I don't mind a loop, but that one looks broken. We have to clear the
> > > bit before we call the function that processes that type of barrier.
> > > Otherwise, if we succeed in absorbing the barrier but a new instance
> > > of the same barrier arrives meanwhile, we'll fail to realize that we
> > > need to absorb the new one.
> >
> > Here's a new version of the patch for allowing errors in
> > barrier-handling functions and/or rejection of barriers by those
> > functions. I think this responds to all of the previous review
> > comments from Andres. Also, here is an 0002 which is a handy bit of
> > test code that I wrote. It's not for commit, but it is useful for
> > finding bugs.
> >
> > In addition to improving 0001 based on the review comments, I also
> > tried to write a better commit message for it, but it might still be
> > possible to do better there. It's a bit hard to explain the idea in
> > the abstract. For ALTER SYSTEM READ ONLY, the idea is that a process
> > with an XID -- and possibly a bunch of sub-XIDs, and possibly while
> > idle-in-transaction -- can elect to FATAL rather than absorbing the
> > barrier. I suspect for other barrier types we might have certain
> > (hopefully short) stretches of code where a barrier of a particular
> > type can't be absorbed because we're in the middle of doing something
> > that relies on the previous value of whatever state is protected by
> > the barrier. Holding off interrupts in those stretches of code would
> > prevent the barrier from being absorbed, but would also prevent query
> > cancel, backend termination, and absorption of other barrier types, so
> > it seems possible that just allowing the barrier-absorption function
> > for a barrier of that type to just refuse the barrier until after the
> > backend exits the critical section of code will work out better.
> >
> > Just for kicks, I tried running 'make installcheck-parallel' while
> > emitting placeholder barriers every 0.05 s after altering the
> > barrier-absorption function to always return false, just to see how
> > ugly that was. In round figures, it made it take 24 s vs. 21 s, so
> > it's actually not that bad. However, it all depends on how many times
> > you hit CHECK_FOR_INTERRUPTS() how quickly, so it's easy to imagine
> > that the effect might be very non-uniform. That is, if you can get the
> > code to be running a tight loop that does little real work but does
> > CHECK_FOR_INTERRUPTS() while refusing to absorb outstanding type of
> > barrier, it will probably suck. Therefore, I'm inclined to think that
> > the fairly strong cautionary logic in the patch is reasonable, but
> > perhaps it can be better worded somehow. Thoughts welcome.
> >
> > I have not rebased the remainder of the patch series over these two.
> >
> That I'll do.
>

Attaching a rebased version includes Robert's patches for the latest master
head.

> On a quick look at the latest 0001 patch, the following hunk to reset leftover
> flags seems to be unnecessary:
>
> + /*
> + * If some barrier types were not successfully absorbed, we will have
> + * to try again later.
> + */
> + if (!success)
> + {
> + ResetProcSignalBarrierBits(flags);
> + return;
> + }
>
> When the ProcessBarrierPlaceholder() function returns false without an error,
> that barrier flag gets reset within the while loop. The case when it has an
> error, the rest of the flags get reset in the catch block. Correct me if I am
> missing something here.
>

Robert, could you please confirm this?

Regards,
Amul

Attachment Content-Type Size
v10-0001-Allow-for-error-or-refusal-while-absorbing-barri.patch application/x-patch 7.5 KB
v10-0005-Error-or-Assert-before-START_CRIT_SECTION-for-WA.patch application/x-patch 70.1 KB
v10-0002-Test-module-for-barriers.-NOT-FOR-COMMIT.patch application/x-patch 3.9 KB
v10-0003-Add-alter-system-read-only-write-syntax.patch application/x-patch 9.3 KB
v10-0004-Implement-ALTER-SYSTEM-READ-ONLY-using-global-ba.patch application/x-patch 41.8 KB
v10-0006-WIP-Documentation.patch application/x-patch 6.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2020-10-28 11:50:46 Re: A new function to wait for the backend exit after termination
Previous Message Pavel Borisov 2020-10-28 11:37:53 Valgrind run error with Postgres on OSX