Re: pg_upgrade failing for 200+ million Large Objects

From: Michael Banck <mbanck(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Kumar, Sachin" <ssetiya(at)amazon(dot)com>, Robins Tharakan <tharakan(at)gmail(dot)com>, Jan Wieck <jan(at)wi3ck(dot)info>, Bruce Momjian <bruce(at)momjian(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_upgrade failing for 200+ million Large Objects
Date: 2024-03-27 09:20:31
Message-ID: 6603e4e0.500a0220.a557f.4f39@mx.google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Sat, Mar 16, 2024 at 06:46:15PM -0400, Tom Lane wrote:
> Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> writes:
> > On Fri, 2024-03-15 at 19:18 -0400, Tom Lane wrote:
> >> This patch seems to have stalled out again. In hopes of getting it
> >> over the finish line, I've done a bit more work to address the two
> >> loose ends I felt were probably essential to deal with:
>
> > Applies and builds fine.
> > I didn't scrutinize the code, but I gave it a spin on a database with
> > 15 million (small) large objects. I tried pg_upgrade --link with and
> > without the patch on a debug build with the default configuration.
>
> Thanks for looking at it!
>
> > Without the patch:
> > Runtime: 74.5 minutes
>
> > With the patch:
> > Runtime: 70 minutes
>
> Hm, I'd have hoped for a bit more runtime improvement.

I also think that this is quite a large runtime for pg_upgrade, but the
more important savings should be the memory usage.

> But perhaps not --- most of the win we saw upthread was from
> parallelism, and I don't think you'd get any parallelism in a
> pg_upgrade with all the data in one database. (Perhaps there is more
> to do there later, but I'm still not clear on how this should interact
> with the existing cross-DB parallelism; so I'm content to leave that
> question for another patch.)

What is the status of this? In the commitfest, this patch is marked as
"Needs Review" with Nathan as reviewer - Nathan, were you going to take
another look at this or was your mail from January 12th a full review?

My feeling is that this patch is "Ready for Committer" and it is Tom's
call to commit it during the next days or not.

I am +1 that this is an important feature/bug fix to have. Because we
have customers stuck on older versions due to their pathological large
objects usage, I did some benchmarks (jsut doing pg_dump, not
pg_upgarde) a while ago which were also very promising; however, I lost
the exact numbers/results. I am happy to do further tests if that is
required for this patch to go forward.

Also, is there a chance this is going to be back-patched? I guess it
would be enough if the ugprade target is v17 so it is less of a concern,
but it would be nice if people with millions of large objects are not
stuck until they are ready to ugprade to v17.

Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2024-03-27 09:25:17 Re: Introduce XID age and inactive timeout based replication slot invalidation
Previous Message Vallimaharajan G 2024-03-27 09:07:43 Query Regarding frame options initialization in Window aggregate state