Re: O(n) tasks cause lengthy startups and checkpoints

From: Andres Freund <andres(at)anarazel(dot)de>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Maxim Orlov <orlovmg(at)gmail(dot)com>, Amul Sul <sulamul(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: O(n) tasks cause lengthy startups and checkpoints
Date: 2022-02-17 23:12:47
Message-ID: 20220217231247.llfqjvvuk6rkawel@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-02-17 14:58:38 -0800, Nathan Bossart wrote:
> On Thu, Feb 17, 2022 at 02:28:29PM -0800, Andres Freund wrote:
> > As far as I understand, the primary concern are logical decoding serialized
> > snapshots, because a lot of them can accumulate if there e.g. is an old unused
> > / far behind slot. It should be easy to reduce the number of those snapshots
> > by e.g. eliding some redundant ones. Perhaps we could also make backends in
> > logical decoding occasionally do a bit of cleanup themselves.
> >
> > I've not seen reports of the number of mapping files to be an real issue?
>
> I routinely see all four of these tasks impacting customers, but I'd say
> the most common one is the temporary file cleanup.

I took temp file cleanup and StartupReorderBuffer() "out of consideration" for
custodian, because they're not needed during normal running.

> Besides eliminating some redundant files and having backends perform some
> cleanup, what do you think about skipping the logical decoding cleanup
> during end-of-recovery/shutdown checkpoints?

I strongly disagree with it. Then you might never get the cleanup done, but
keep on operating until you hit corruption issues.

> > The improvements around deleting temporary files and serialized snapshots
> > afaict don't require a dedicated process - they're only relevant during
> > startup. We could use the approach of renaming the directory out of the way as
> > done in this patchset but perform the cleanup in the startup process after
> > we're up.
>
> Perhaps this is a good place to start. As I mentioned above, IME the
> temporary file cleanup is the most common problem, so I think even getting
> that one fixed would be a huge improvement.

Cool.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-02-17 23:39:52 Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Previous Message David Rowley 2022-02-17 23:10:51 Re: Use generation context to speed up tuplesorts