Re: O(n) tasks cause lengthy startups and checkpoints

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Maxim Orlov <orlovmg(at)gmail(dot)com>, Amul Sul <sulamul(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: O(n) tasks cause lengthy startups and checkpoints
Date: 2022-02-17 22:58:38
Message-ID: 20220217225838.GA3292179@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 17, 2022 at 02:28:29PM -0800, Andres Freund wrote:
> As far as I understand, the primary concern are logical decoding serialized
> snapshots, because a lot of them can accumulate if there e.g. is an old unused
> / far behind slot. It should be easy to reduce the number of those snapshots
> by e.g. eliding some redundant ones. Perhaps we could also make backends in
> logical decoding occasionally do a bit of cleanup themselves.
>
> I've not seen reports of the number of mapping files to be an real issue?

I routinely see all four of these tasks impacting customers, but I'd say
the most common one is the temporary file cleanup. Besides eliminating
some redundant files and having backends perform some cleanup, what do you
think about skipping the logical decoding cleanup during
end-of-recovery/shutdown checkpoints? This was something that Bharath
brought up a while back [0]. As I noted in that thread, startup and
shutdown could still take a while if checkpoints are regularly delayed due
to logical decoding cleanup, but that might still help avoid a bit of
downtime.

> The improvements around deleting temporary files and serialized snapshots
> afaict don't require a dedicated process - they're only relevant during
> startup. We could use the approach of renaming the directory out of the way as
> done in this patchset but perform the cleanup in the startup process after
> we're up.

Perhaps this is a good place to start. As I mentioned above, IME the
temporary file cleanup is the most common problem, so I think even getting
that one fixed would be a huge improvement.

[0] https://postgr.es/m/CALj2ACXkkSL8EBpR7m%3DMt%3DyRGBhevcCs3x4fsp3Bc-D13yyHOg%40mail.gmail.com

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-02-17 23:00:19 Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Previous Message Andres Freund 2022-02-17 22:28:29 Re: O(n) tasks cause lengthy startups and checkpoints