Re: Horribly slow pg_upgrade performance with many Large Objects

From: Hannu Krosing <hannuk(at)google(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Horribly slow pg_upgrade performance with many Large Objects
Date: 2025-07-06 12:48:08
Message-ID: CAMT0RQTXiqH7zdQEVSVd2L7_Cw4wQ1eHOD8hfZ+0vecMXJWc-w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Nathan,

Did a quick check of the patch and it seems to work ok.

What do you think of the idea of not dumping pg_shdepend here, but
instead adding the required entries after loading
pg_largeobject_metadata based on the contents of it ?

The query for this would be

WITH db AS (
SELECT oid FROM pg_database WHERE datname = current_database()
)
INSERT INTO pg_shdepend
SELECT db.oid AS dbid,
2613 AS classid,
lm.oid AS objid,
0 AS objsubid,
1260 AS refclassid,
COALESCE(acl.grantee, lm.lomowner) AS refobjid,
CASE WHEN grantee IS NULL or grantee = lomowner
THEN 'o'
ELSE 'a'
END as deptype
FROM pg_largeobject_metadata as lm
LEFT JOIN LATERAL (
SELECT DISTINCT (aclexplode(lm.lomacl)).grantee
) AS acl ON true,
db
WHERE (lm.oid, COALESCE(acl.grantee, lm.lomowner)) NOT IN (SELECT
objid, refobjid FROM pg_shdepend WHERE dbid = db.oid)
;

(I had hoped to use ON CONFLICT DO NOTHING but this is not supported
for system tables.)

---|
Hannu

On Wed, May 7, 2025 at 4:51 PM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
> On Mon, May 05, 2025 at 02:23:25PM -0500, Nathan Bossart wrote:
> > That leaves pg_shdepend. For now, I've just instructed pg_upgrade to COPY
> > the relevant pg_shdepend rows as an independent step, but perhaps there's a
> > reasonably straightforward way to put that in pg_dump, too.
>
> It turns out there is. TableDataInfo's filtercond field can be used to
> easily add a WHERE clause to the data dumping command. On my laptop,
> upgrading with --jobs=8 with 10M large objects evenly distributed across 10
> databases (each with a non-bootstrap-superuser owner and another role with
> select rights) takes ~100 seconds without this patch and ~30 seconds with
> it.
>
> I've also added dependency tracking, version checks (this only works for
> upgrades from >=v12 for now), a hack to ensure the columns for
> pg_largeobject_metadata/pg_shdepend are collected, and comments. I'm sure
> there's something I've missed, but this patch has worked well in my tests
> thus far.
>
> Taking a step back, I'm a little disappointed in the gains here. A 3-9x
> speedup is nice, but I guess I was hoping to find another order of
> magnitude somewhere. To do any better, I think we'd need to copy the files
> for pg_largeobject_metadata directly for upgrades from >= v16, but that
> would have to fit somewhere between when pg_restore creates the database
> and when it restores any large object comments/seclabels. I'm not wild
> about the amount of hackery required to get that working.
>
> --
> nathan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Dolgov 2025-07-06 13:01:34 Re: Changing shared_buffers without restart
Previous Message Hannu Krosing 2025-07-06 11:48:13 Re: [PATCH] Extending FK check skipping on replicas to ADD FK and TRUNCATE