Re: pg_upgrade failures with large partition definitions on upgrades from ~13 to 14~

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, ksarabu(at)amazon(dot)com
Subject: Re: pg_upgrade failures with large partition definitions on upgrades from ~13 to 14~
Date: 2023-02-09 05:33:06
Message-ID: 3834898.1675920786@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Michael Paquier <michael(at)paquier(dot)xyz> writes:
> The following SQL sequence causes a failure of pg_upgrade when these
> are executed on a cluster of ~13, doing an upgrade to 14~, assuming
> that the relation page size is 8kB.
> ...
> No fields have been added to pg_class between 13 and 14, however the
> amount of data stored in relpartbound got larger between these two
> versions (just do a length() on it for example using what I posted
> above). Hence, if the original cluster has a version of pg_class
> large enough to just fit into a single page without the need of
> toasting, it may fail when created in the new cluster because it lacks
> space to fit on a page because of this extra partition bound data.

Bleah.

> Shouldn't we have a safeguard of some kind in the pre-check phase of
> pg_upgrade at least? I think that this comes down to checking
> sum(pg_column_size(pg_class.*)), roughly, with alignment and page
> header, and do the same for pg_attribute.

It might be worth expending a pre-check on, if only because the
check could offer some advice about fixing the problem. But it
seems like quite a corner case --- what are the odds of hitting
this?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-02-09 05:45:40 Re: typos
Previous Message Michael Paquier 2023-02-09 05:17:36 pg_upgrade failures with large partition definitions on upgrades from ~13 to 14~