Re: O(n) tasks cause lengthy startups and checkpoints

From: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Maxim Orlov <orlovmg(at)gmail(dot)com>, Amul Sul <sulamul(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: O(n) tasks cause lengthy startups and checkpoints
Date: 2022-11-28 13:08:57
Message-ID: CANbhV-GXQ+owyk_jnowFRJEVX1TDgQrnQrgQb6C87V6jTecpZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 27 Nov 2022 at 23:34, Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
> Thanks for taking a look!
>
> On Thu, Nov 24, 2022 at 05:31:02PM +0000, Simon Riggs wrote:
> > * not sure I believe that everything it does can always be aborted out
> > of and shutdown - to achieve that you will need a
> > CHECK_FOR_INTERRUPTS() calls in the loops in patches 5 and 6 at least
>
> I did something like this earlier, but was advised to simply let the
> functions finish as usual during shutdown [0]. I think this is what the
> checkpointer process does today, anyway.

If we say "The custodian is not an essential process and can shutdown
quickly when requested.", and yet we know its not true in all cases,
then that will lead to misunderstandings and bugs.

If we perform a restart and the custodian is performing extra work
that delays shutdown, then it also delays restart. Given the title of
the thread, we should be looking to improve that, or at least know it
occurred.

> > * not sure why you want immediate execution of custodian tasks - I
> > feel supporting two modes will be a lot harder. For me, I would run
> > locally when !IsUnderPostmaster and also in an Assert build, so we can
> > test it works right - i.e. running in its own process is just a
> > production optimization for performance (which is the stated reason
> > for having this)
>
> I added this because 0004 involves requesting a task from the postmaster,
> so checking for IsUnderPostmaster doesn't work. Those tasks would always
> run immediately. However, we could use IsPostmasterEnvironment instead,
> which would allow us to remove the "immediate" argument. I did it this way
> in v14.

Thanks

> > 0005 seems good from what I know
> > * There is no check to see if it worked in any sane time
>
> What did you have in mind? Should the custodian begin emitting WARNINGs
> after a while?

I think it might be useful if it logged anything that took an
"extended period", TBD.

Maybe that is already covered by startup process logging. Please tell
me that still works?

> > Rather than explicitly use DEBUG1 everywhere I would have an
> > #define CUSTODIAN_LOG_LEVEL LOG
> > so we can run with it in LOG mode and then set it to DEBUG1 with a one
> > line change in a later phase of Beta
>
> I can create a separate patch for this, but I don't think I've ever seen
> this sort of thing before.

Much of recovery is coded that way, for the same reason.

> Is the idea just to help with debugging during
> the development phase?

"Just", yes. Tests would be desirable also, under src/test/modules.

--
Simon Riggs http://www.EnterpriseDB.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-11-28 13:10:42 Re: Perform streaming logical transactions by background workers and parallel apply
Previous Message Robert Haas 2022-11-28 12:58:00 Re: Reducing power consumption on idle servers