optimizing pg_upgrade's once-in-each-database steps

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: optimizing pg_upgrade's once-in-each-database steps
Date: 2024-05-16 21:16:38
Message-ID: 20240516211638.GA1688936@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

A number of pg_upgrade steps require connecting to each database and
running a query. When there are many databases, these steps are
particularly time-consuming, especially since this is done sequentially in
a single process. At a quick glance, I see the following such steps:

* create_logical_replication_slots
* check_for_data_types_usage
* check_for_isn_and_int8_passing_mismatch
* check_for_user_defined_postfix_ops
* check_for_incompatible_polymorphics
* check_for_tables_with_oids
* check_for_user_defined_encoding_conversions
* check_old_cluster_subscription_state
* get_loadable_libraries
* get_db_rel_and_slot_infos
* old_9_6_invalidate_hash_indexes
* report_extension_updates

I set out to parallelize these kinds of steps via multiple threads or
processes, but I ended up realizing that we could likely achieve much of
the same gain with libpq's asynchronous APIs. Specifically, both
establishing the connections and running the queries can be done without
blocking, so we can just loop over a handful of slots and advance a simple
state machine for each. The attached is a proof-of-concept grade patch for
doing this for get_db_rel_and_slot_infos(), which yielded the following
results on my laptop for "pg_upgrade --link --sync-method=syncfs --jobs 8"
for a cluster with 10K empty databases.

total pg_upgrade_time:
* HEAD: 14m 8s
* patch: 10m 58s

get_db_rel_and_slot_infos() on old cluster:
* HEAD: 2m 45s
* patch: 36s

get_db_rel_and_slot_infos() on new cluster:
* HEAD: 1m 46s
* patch: 29s

I am posting this early to get thoughts on the general approach. If we
proceeded with this strategy, I'd probably create some generic tooling that
each relevant step would provide a set of callback functions.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v1-0001-parallel-get-relinfos.patch text/x-diff 11.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2024-05-16 21:18:01 Re: TerminateOtherDBBackends code comments inconsistency.
Previous Message David G. Johnston 2024-05-16 21:15:39 Re: commitfest.postgresql.org is no longer fit for purpose