Re: making relfilenodes 56 bits

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: making relfilenodes 56 bits
Date: 2022-08-22 07:55:34
Message-ID: CAA4eK1La-ZYvCjZAmShmGFvvhHNo2wbtSp80kXedS3d8A8R=aA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jul 30, 2022 at 1:59 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Wed, Jul 20, 2022 at 7:27 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > There was also an issue where the user table from the old cluster's
> > relfilenode could conflict with the system table of the new cluster.
> > As a solution currently for system table object (while creating
> > storage first time) we are keeping the low range of relfilenumber,
> > basically we are using the same relfilenumber as OID so that during
> > upgrade the normal user table from the old cluster will not conflict
> > with the system tables in the new cluster. But with this solution
> > Robert told me (in off list chat) a problem that in future if we want
> > to make relfilenumber completely unique within a cluster by
> > implementing the CREATEDB differently then we can not do that as we
> > have created fixed relfilenodes for the system tables.
> >
> > I am not sure what exactly we can do to avoid that because even if we
> > do something to avoid that in the new cluster the old cluster might
> > be already using the non-unique relfilenode so after upgrading the new
> > cluster will also get those non-unique relfilenode.
>
> I think this aspect of the patch could use some more discussion.
>
> To recap, the problem is that pg_upgrade mustn't discover that a
> relfilenode that is being migrated from the old cluster is being used
> for some other table in the new cluster. Since the new cluster should
> only contain system tables that we assume have never been rewritten,
> they'll all have relfilenodes equal to their OIDs, and thus less than
> 16384. On the other hand all the user tables from the old cluster will
> have relfilenodes greater than 16384, so we're fine. pg_largeobject,
> which also gets migrated, is a special case. Since we don't change OID
> assignments from version to version, it should have either the same
> relfilenode value in the old and new clusters, if never rewritten, or
> else the value in the old cluster will be greater than 16384, in which
> case no conflict is possible.
>
> But if we just assign all relfilenode values from a central counter,
> then we have got trouble. If the new version has more system catalog
> tables than the old version, then some value that got used for a user
> table in the old version might get used for a system table in the new
> version, which is a problem. One idea for fixing this is to have two
> RelFileNumber ranges: a system range (small values) and a user range.
> System tables get values in the system range initially, and in the
> user range when first rewritten. User tables always get values in the
> user range. Everything works fine in this scenario except maybe for
> pg_largeobject: what if it gets one value from the system range in the
> old cluster, and a different value from the system range in the new
> cluster, but some other system table in the new cluster gets the value
> that pg_largeobject had in the old cluster? Then we've got trouble.
>

To solve that problem, how about rewriting the system table in the new
cluster which has a conflicting relfilenode? I think we can probably
do this conflict checking before processing the tables from the old
cluster.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhang Mingli 2022-08-22 07:57:18 Fix typo kill_prio_tuple
Previous Message Julien Rouhaud 2022-08-22 07:33:48 Re: Schema variables - new implementation for Postgres 15