Re: making relfilenodes 56 bits

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: making relfilenodes 56 bits
Date: 2022-08-22 20:16:15
Message-ID: CA+TgmoY7ZJw3V8McB=vN2m=sLmhaySY+7xNxuv1dgp+tYM6Hjw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 22, 2022 at 3:55 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> To solve that problem, how about rewriting the system table in the new
> cluster which has a conflicting relfilenode? I think we can probably
> do this conflict checking before processing the tables from the old
> cluster.

Thanks for chiming in.

Right now, there are two parts to the relfilenumber preservation
system, and this scheme doesn't quite fit into either of them. First,
the dump includes commands to set pg_class.relfilenode in the new
cluster to the same value that it had in the old cluster. The dump
can't include any SQL commands that depend on what's happening in the
new cluster because pg_dump(all) only connects to a single cluster,
which in this case is the old cluster. Second, pg_upgrade itself
copies the files from the old cluster to the new cluster. This doesn't
involve a database connection at all. So there's no part of the
current relfilenode preservation mechanism that can look at the old
AND the new database and decide on some SQL to execute against the new
database.

I thought for a while that we could use the information that's already
gathered by get_rel_infos() to do what you're suggesting here, but it
doesn't quite work, because that function excludes system tables, and
we can't afford to do that here. We'd either need to modify that query
to include system tables - at least for the new cluster - or run a
separate one to gather information about system tables in the new
cluster. Then, we could put all the pg_class.relfilenode values we
found in the new cluster into a hash table, loop over the list of rels
this function found in the old cluster, and for each one, probe into
the hash table. If we find a match, that's a system table that needs
to be moved out of the way before calling create_new_objects(), or
maybe inside that function but before it runs pg_restore.

That doesn't seem too crazy, I think. It's a little bit of new
mechanism, but it doesn't sound horrific. It's got the advantage of
being significantly cheaper than my proposal of moving everything out
of the way unconditionally, and at the same time it retains one of the
key advantages of that proposal - IMV, anyway - which is that we don't
need separate relfilenode ranges for user and system objects any more.
So I guess on balance I kind of like it, but maybe I'm missing
something.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ibrar Ahmed 2022-08-22 20:26:08 Re: CFM Manager
Previous Message Nikita Malakhov 2022-08-22 19:41:33 Re: [PATCH] ALTER TABLE ... SET STORAGE default