Re: Horribly slow pg_upgrade performance with many Large Objects

From: Hannu Krosing <hannuk(at)google(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Horribly slow pg_upgrade performance with many Large Objects
Date: 2025-04-13 20:17:59
Message-ID: CAMT0RQQcOo3swiVnuUQso+CNr9Rk1vLL=5ceDnRDGURmatDuCw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

And in case there *is* ACL present then each user mentioned in the ACL
adds more overhead

Also the separate GRANT calls cause bloat as the
pg_largeoject_metadata row gets updated for each ALTER USER or GRANT

The following is for 10 million LOs with 1 and 3 users being GRANTed
SELECT on each object (with no grants the pg_restore run was 10
minutes)

Nr of GRANTS | pg_dump time | pg_restore time
--------------+--------------+----------------
0 | 0m 10s | 10m 5s
1 | 0m 17s | 15m 3s
3 | 0m 21s | 27m 15s

NB! - I left out the --verbose flag from pg_dump as used by
pg_upgrade, as it will emit one line per LO dumped

## 1 GRANT / LO

hannuk(at)db01-c1a:~/work/lo-testing$ time pg_dump --schema-only
--quote-all-identifiers --binary-upgrade --format=custom
--file=lodb10m.dump -p 5433 lodb10m
real 0m17.022s
user 0m2.956s
sys 0m1.453s

hannuk(at)db01-c1a:~/work/lo-testing$ time pg_restore -p 5434
--exit-on-error --transaction-size=1000 --dbname lodb10m lodb10m.dump
real 15m3.136s
user 0m28.991s
sys 2m54.164s

## 3 GRANTs / LO

make sample LO with 3 grants

ALTER LARGE OBJECT 1 OWNER TO "hannuk";
GRANT SELECT ON LARGE OBJECT 1 TO "bob";
GRANT SELECT ON LARGE OBJECT 1 TO "joe";
GRANT SELECT ON LARGE OBJECT 1 TO "tom";
lodb10m=# select * from pg_shdepend where objid = 1;

┌───────┬─────────┬───────┬──────────┬────────────┬──────────┬─────────┐
│ dbid │ classid │ objid │ objsubid │ refclassid │ refobjid │ deptype │
├───────┼─────────┼───────┼──────────┼────────────┼──────────┼─────────┤
│ 16406 │ 2613 │ 1 │ 0 │ 1260 │ 16384 │ o │
│ 16406 │ 2613 │ 1 │ 0 │ 1260 │ 16393 │ a │
│ 16406 │ 2613 │ 1 │ 0 │ 1260 │ 16394 │ a │
│ 16406 │ 2613 │ 1 │ 0 │ 1260 │ 16395 │ a │
└───────┴─────────┴───────┴──────────┴────────────┴──────────┴─────────┘

lodb10m=# select * from pg_largeobject_metadata ;
┌─────┬──────────┬───────────────────────────────────────────────────────────┐
│ oid │ lomowner │ lomacl │
├─────┼──────────┼───────────────────────────────────────────────────────────┤
│ 1 │ 16384 │ {hannuk=rw/hannuk,bob=r/hannuk,joe=r/hannuk,tom=r/hannuk} │
└─────┴──────────┴───────────────────────────────────────────────────────────┘

Make the remaining 10M-1 LOs

lodb10m=# insert into pg_largeobject_metadata(oid, lomowner, lomacl)
SELECT i, 16384,
'{hannuk=rw/hannuk,bob=r/hannuk,joe=r/hannuk,tom=r/hannuk}' FROM
generate_series(2, 10_000_000) g(i);
INSERT 0 9999999
Time: 18859.341 ms (00:18.859)

And add their sharedeps

lodb10m=# WITH refdeps (robj, rdeptype)
AS ( VALUES
(16384, 'o'),
(16393, 'a'),
(16394, 'a'),
(16395, 'a')
)
INSERT INTO pg_shdepend SELECT 16396, 2613, i, 0, 1260, robj, rdeptype
FROM generate_series(2, 10_000_000) g(i)
, refdeps
;
INSERT 0 39999996
Time: 116697.342 ms (01:56.697)

Time pg_upgrade's pg_dump and pg_reload

hannuk(at)db01-c1a:~/work/lo-testing$ time pg_dump --schema-only
--quote-all-identifiers --binary-upgrade --format=custom
--file=lodb10m-3grants.dump -p 5433 lodb10m
real 0m21.519s
user 0m2.951s
sys 0m1.723s

hannuk(at)db01-c1a:~/work/lo-testing$ time pg_restore -p 5434
--exit-on-error --transaction-size=1000 --dbname lodb10m
lodb10m-3grants.dump
real 27m15.372s
user 0m45.157s
sys 4m57.513s

On Fri, Apr 11, 2025 at 10:11 PM Nathan Bossart
<nathandbossart(at)gmail(dot)com> wrote:
>
> On Tue, Apr 08, 2025 at 12:22:00PM -0500, Nathan Bossart wrote:
> > On Tue, Apr 08, 2025 at 01:07:09PM -0400, Tom Lane wrote:
> >> Nathan Bossart <nathandbossart(at)gmail(dot)com> writes:
> >>> I do think it's worth considering going back to copying
> >>> pg_largobject_metadata's files for upgrades from v16 and newer.
> >>
> >> (If we do this) I don't see why we'd need to stop at v16. I'm
> >> envisioning that we'd use COPY, which will be dealing in the
> >> text representation of aclitems, and I don't think that's changed
> >> in a long time. The sort of thing that would break it is changes
> >> in the set of available/default privilege bits for large objects.
> >
> > I was thinking of actually reverting commit 12a53c7 for upgrades from v16,
> > which AFAICT is the last release where any relevant storage formats changed
> > (aclitem changed in v16). But if COPY gets us pretty close to that and is
> > less likely to be disrupted by future changes, it could be a better
> > long-term approach.
> >
> >> That is, where the dump currently contains something like
> >>
> >> SELECT pg_catalog.lo_create('2121');
> >> ALTER LARGE OBJECT 2121 OWNER TO postgres;
> >> GRANT ALL ON LARGE OBJECT 2121 TO joe;
> >>
> >> we'd have
> >>
> >> COPY pg_largeobject_metadata FROM STDIN;
> >> ...
> >> 2121 10 {postgres=rw/postgres,joe=rw/postgres}
> >> ...
> >>
> >> and some appropriate COPY data for pg_shdepend too.
>
> I did some more research here. For many large objects without ACLs to
> dump, I noticed that the vast majority of time is going to restoring the
> ALTER OWNER commands. For 1 million such large objects, restoring took ~73
> seconds on my machine. If I instead invented an lo_create_with_owner()
> function and created 100 per SELECT command, the same restore takes ~7
> seconds. Copying the relevant pg_shdepend rows out and back in takes ~2.5
> seconds. I imagine using COPY for pg_largeobject_metadata would also take
> a couple of seconds in this case.
>
> For upgrading, I don't think there's any huge benefit to optimizing the
> restore commands versus using COPY. It might make future catalog changes
> for large object stuff easier, but I'd expect those to be rare. However,
> the optimized restore commands could be nice for non-pg_upgrade use-cases.
>
> --
> nathan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2025-04-13 20:37:58 Re: datfrozenxid > relfrozenxid w/ crash before XLOG_HEAP_INPLACE
Previous Message Dmitry Dolgov 2025-04-13 19:53:43 Re: Proposal: Adding compression of temporary files