Re: making relfilenodes 56 bits

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: making relfilenodes 56 bits
Date: 2022-09-03 08:20:36
Message-ID: CAFiTN-t7ppiggRuFTkZJLA3MaQbb5a0pA+gH3Gs302rYMV7_vg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 30, 2022 at 9:23 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> Well, that's very awkward. It doesn't seem like it would be very
> difficult to teach pg_upgrade to call pg_restore without --clean and
> just do the drop database itself, but that doesn't really help,
> because pg_restore will in any event be creating the new database.
> That doesn't seem like something we can practically refactor out,
> because only pg_dump knows what properties to use when creating the
> new database. What we could do is have the dump include a command like
> SELECT pg_binary_upgrade_move_things_out_of_the_way(some_arguments_here),
> but that doesn't really help very much, because passing the whole list
> of relfilenode values from the old database seems pretty certain to be
> a bad idea. The whole idea here was that we'd be able to build a hash
> table on the new database's system table OIDs, and it seems like
> that's not going to work.

Right.

> We could try to salvage some portion of the idea by making
> pg_binary_upgrade_move_things_out_of_the_way() take a more restricted
> set of arguments, like the smallest and largest relfilenode values
> from the old database, and then we'd just need to move things that
> overlap. But that feels pretty hit-or-miss to me as to whether it
> actually avoids any work, and
> pg_binary_upgrade_move_things_out_of_the_way() might also be annoying
> to write. So perhaps we have to go back to the drawing board here.

So as of now, we have two open options 1) the current approach and
what patch is following to use Oid as relfilenode for the system
tables when initially created. 2) call
pg_binary_upgrade_move_things_out_of_the_way() which force rewrite all
the system tables.

Another idea that I am not very sure how feasible is. Can we change
the dump such that in binary upgrade mode it will not use template0 as
a template database (in creating database command) but instead some
new database as a template e.g. template-XYZ? And later for conflict
checking, we will create this template-XYZ database on the new cluster
and then we will perform all the conflict check (from all the
databases of the old cluster) and rewrite operations on this database.
And later all the databases will be created using template-XYZ as the
template and all the rewriting stuff we have done is still intact.
The problems I could think of are 1) only for a binary upgrade we will
have to change the pg_dump. 2) we will have to use another database
name as the reserved database name but what if that name is already in
use in the previous cluster?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dong Wook Lee 2022-09-03 08:27:39 vacuumlo: add test to vacuumlo for test coverage
Previous Message Michael Paquier 2022-09-03 08:06:19 Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work