Re: pg_upgrade failing for 200+ million Large Objects

From: "Kumar, Sachin" <ssetiya(at)amazon(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jacob Champion <champion(dot)p(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Jan Wieck <jan(at)wi3ck(dot)info>, Bruce Momjian <bruce(at)momjian(dot)us>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Robins Tharakan <tharakan(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_upgrade failing for 200+ million Large Objects
Date: 2023-12-04 16:07:59
Message-ID: 83D44BE5-0088-4D41-8AE6-20A05D026F46@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us <mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us>> wrote:

> FWIW, I agree with Jacob's concern about it being a bad idea to let
> users of pg_upgrade pass down arbitrary options to pg_dump/pg_restore.
> I think we'd regret going there, because it'd hugely expand the set
> of cases pg_upgrade has to deal with.

> Also, pg_upgrade is often invoked indirectly via scripts, so I do
> not especially buy the idea that we're going to get useful control
> input from some human somewhere. I think we'd be better off to
> assume that pg_upgrade is on its own to manage the process, so that
> if we need to switch strategies based on object count or whatever,
> we should put in a heuristic to choose the strategy automatically.
> It might not be perfect, but that will give better results for the
> pretty large fraction of users who are not going to mess with
> weird little switches.

I have updated the patch to use heuristic, During pg_upgrade we count
Large objects per database. During pg_restore execution if db large_objects
count is greater than LARGE_OBJECTS_THRESOLD (1k) we will use
--restore-blob-batch-size.
I also modified pg_upgrade --jobs behavior if we have large_objects (> LARGE_OBJECTS_THRESOLD)

+ /* Restore all the dbs where LARGE_OBJECTS_THRESOLD is not breached */
+ restore_dbs(stats, true);
+ /* reap all children */
+ while (reap_child(true) == true)
+ ;
+ /* Restore rest of the dbs one by one with pg_restore --jobs = user_opts.jobs */
+ restore_dbs(stats, false);
/* reap all children */
while (reap_child(true) == true)
;

Regards
Sachin

Attachment Content-Type Size
pg_upgrade_improvements_v7.diff application/octet-stream 27.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2023-12-04 16:18:06 Re: Proposal: In-flight explain logging
Previous Message Joe Conway 2023-12-04 15:45:58 Re: Emitting JSON to file using COPY TO