Re: [Patch] ALTER SYSTEM READ ONLY

From: Amul Sul <sulamul(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Patch] ALTER SYSTEM READ ONLY
Date: 2020-12-14 14:33:32
Message-ID: CAAJ_b96Fia+XQHvkpDcLGvhJ5iK0dZZnPZFoi69REWb2w6ga+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 14, 2020 at 11:28 AM Amul Sul <sulamul(at)gmail(dot)com> wrote:
>
> On Thu, Dec 10, 2020 at 6:04 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > Hi,
> >
> > On 2020-12-09 16:13:06 -0500, Robert Haas wrote:
> > > That's not good. On a typical busy system, a system is going to be in
> > > the middle of a checkpoint most of the time, and the checkpoint will
> > > take a long time to finish - maybe minutes.
> >
> > Or hours, even. Due to the cost of FPWs it can make a lot of sense to
> > reduce the frequency of that cost...
> >
> >
> > > We want this feature to respond within milliseconds or a few seconds,
> > > not minutes. So we need something better here.
> >
> > Indeed.
> >
> >
> > > I'm inclined to think
> > > that we should try to CompleteWALProhibitChange() at the same places
> > > we AbsorbSyncRequests(). We know from experience that bad things
> > > happen if we fail to absorb sync requests in a timely fashion, so we
> > > probably have enough calls to AbsorbSyncRequests() to make sure that
> > > we always do that work in a timely fashion. So, if we do this work in
> > > the same place, then it will also be done in a timely fashion.
> >
> > Sounds sane, without having looked in detail.
> >
>
> Understood & agreed that we need to change the system state as soon as possible.
>
> I can see AbsorbSyncRequests() is called from 4 routing as
> CheckpointWriteDelay(), ProcessSyncRequests(), SyncPostCheckpoint() and
> CheckpointerMain(). Out of 4, the first three executes with an interrupt is on
> hod which will cause a problem when we do emit barrier and wait for those
> barriers absorption by all the process including itself and will cause an
> infinite wait. I think that can be fixed by teaching WaitForProcSignalBarrier(),
> do not wait on self to absorb barrier. Let that get absorbed at a later point
> in time when the interrupt is resumed. I assumed that we cannot do barrier
> processing right away since there could be other barriers (maybe in the future)
> including ours that should not process while the interrupt is on hold.
>

CreateCheckPoint() holds CheckpointLock LW at start and releases at the end
which puts interrupt on hold. This kinda surprising that we were holding this
lock and putting interrupt on hots for a long time. We do need that
CheckpointLock just to ensure that one checkpoint happens at a time. Can't we do
something easy to ensure that instead of the lock? Probably holding off
interrupts for so long doesn't seem to be a good idea. Thoughts/Suggestions?

Regards,
Amul

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-12-14 14:33:53 Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
Previous Message Ashutosh Bapat 2020-12-14 14:09:36 Re: Insert Documentation - Returning Clause and Order