O(n) tasks cause lengthy startups and checkpoints

From: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: O(n) tasks cause lengthy startups and checkpoints
Date: 2021-12-01 20:24:25
Message-ID: C1EE64B0-D4DB-40F3-98C8-0CED324D34CB@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

Thanks to 61752af, SyncDataDirectory() can make use of syncfs() to
avoid individually syncing all database files after a crash. However,
as noted earlier this year [0], there are still a number of O(n) tasks
that affect startup and checkpointing that I'd like to improve.
Below, I've attempted to summarize each task and to offer ideas for
improving matters. I'll likely split each of these into its own
thread, given there is community interest for such changes.

1) CheckPointSnapBuild(): This function loops through
pg_logical/snapshots to remove all snapshots that are no longer
needed. If there are many entries in this directory, this can take
a long time. The note above this function indicates that this is
done during checkpoints simply because it is convenient. IIUC
there is no requirement that this function actually completes for a
given checkpoint. My current idea is to move this to a new
maintenance worker.
2) CheckPointLogicalRewriteHeap(): This function loops through
pg_logical/mappings to remove old mappings and flush all remaining
ones. IIUC there is no requirement that the "remove old mappings"
part must complete for a given checkpoint, but the "flush all
remaining" portion allows replay after a checkpoint to only "deal
with the parts of a mapping that have been written out after the
checkpoint started." Therefore, I think we should move the "remove
old mappings" part to a new maintenance worker (probably the same
one as for 1), and we should consider using syncfs() for the "flush
all remaining" part. (I suspect the main argument against the
latter will be that it could cause IO spikes.)
3) RemovePgTempFiles(): This step can delay startup if there are many
temporary files to individually remove. This step is already
optionally done after a crash via the remove_temp_files_after_crash
GUC. I propose that we have startup move the temporary file
directories aside and create new ones, and then a separate worker
(probably the same one from 1 and 2) could clean up the old files.
4) StartupReorderBuffer(): This step deletes logical slot data that
has been spilled to disk. This code appears to be written to avoid
deleting different types of files in these directories, but AFAICT
there shouldn't be any other files. Therefore, I think we could do
something similar to 3 (i.e., move the directories aside during
startup and clean them up via a new maintenance worker).

I realize adding a new maintenance worker might be a bit heavy-handed,
but I think it would be nice to have somewhere to offload tasks that
really shouldn't impact startup and checkpointing. I imagine such a
process would come in handy down the road, too. WDYT?

Nathan

[0] https://postgr.es/m/32B59582-AA6C-4609-B08F-2256A271F7A5%40amazon.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2021-12-01 20:29:47 Re: Column Filtering in Logical Replication
Previous Message Simon Riggs 2021-12-01 19:38:22 Re: SKIP LOCKED assert triggered