preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Shruthi Gowda <gowdashru(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Tom Kincaid <tomjohnkincaid(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Date: 2021-08-17 15:56:30
Message-ID: CA+TgmoYgTwYcUmB=e8+hRHOFA0kkS6Kde85+UNdon6q7bt1niQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 11, 2021 at 3:41 AM Shruthi Gowda <gowdashru(at)gmail(dot)com> wrote:
> I have fixed all the issues and now the patch is working as expected.

Hi,

I'm changing the subject line since the patch does something which was
discussed on that thread but isn't really related to the old email
subject. In general, I think this patch is uncontroversial and in
reasonably good shape. However, there's one part that I'm not too sure
about. If Tom Lane happens to be paying attention to this thread, I
think his feedback would be particularly useful, since he knows a lot
about the inner workings of pg_dump. Opinions from anybody else would
be great, too. Anyway, here's the hunk that worries me:

+
+ /*
+ * Need a separate entry, otherwise the command will
be run in the
+ * same transaction as the CREATE DATABASE command, which is not
+ * allowed.
+ */
+ ArchiveEntry(fout,
+ dbCatId, /* catalog ID */
+ dbDumpId, /* dump ID */
+ ARCHIVE_OPTS(.tag = datname,
+ .owner = dba,
+
.description = "SET_DB_OID",
+
.section = SECTION_PRE_DATA,
+
.createStmt = setDBIdQry->data,
+
.dropStmt = NULL));
+

To me, adding a separate TOC entry for a thing that is not really a
separate object seems like a scary hack that might come back to bite
us. Unfortunately, I don't know enough about pg_dump to say exactly
how it might come back to bite us, which leaves wide open the
possibility that I am completely wrong.... I just think it's the
intention that archive entries correspond to actual objects in the
database, not commands that we want executed in some particular order.
If that criticism is indeed correct, then my proposal would be to
instead add a WITH OID = nnn option to CREATE DATABASE and allow it to
be used only in binary upgrade mode. That has the disadvantage of
being inconsistent with the way that we preserve OIDs everywhere else,
but the only other alternatives are (1) do something like the above,
(2) remove the requirement that CREATE DATABASE run in its own
transaction, and (3) give up. (2) sounds hard and (3) is unappealing.

The rest of this email will be detailed review comments on the patch
as presented, and thus probably only interesting to someone actually
working on the patch. Feel free to skip if that's not you.

- I suggest splitting the patch into one portion that deals with
database OID and another portion that deals with tablespace OID and
relfilenode OID, or maybe splitting it all the way into three separate
patches, one for each. This could allow the uncontroversial parts to
get committed first while we're wondering what to do about the problem
described above.

- There are two places in the patch, one in dumpDatabase() and one in
generate_old_dump() where blank lines are removed with no other
changes. Please avoid whitespace-only hunks.

- If possible, please try to pgindent the new code. It's pretty good
what you did, but e.g. the declaration of
binary_upgrade_next_pg_tablespace_oid probably has less whitespace
than pgindent is going to want.

- The comments in dumpDatabase() claim that "postgres" and "template1"
are handled specially in some way, but there seems to be no code that
matches those comments.

- heap_create()'s logic around setting create_storage looks slightly
redundant. I'm not positive what would be better, but ... suppose you
just took the part that's currently gated by if (!IsBinaryUpgrade) and
did it unconditionally. Then put if (IsBinaryUpgrade) around the else
clause, but delete the last bit from there that sets create_storage.
Maybe we'd still want a comment saying that it's intentional that
create_storage = true even though it will be overwritten later, but
then, I think, we wouldn't need to set create_storage in two different
places. Maybe I'm wrong.

- If we're not going to do that, then I think you should swap the if
and else clauses and reverse the sense of the test. In createdb(),
CreateTableSpace(), and a bunch of existing places, we do if
(IsBinaryUpgrade) { ... } else { ... } so I don't think it makes sense
for this one to instead do if (!IsBinaryUpgrade) { ... } else { ... }.

- I'm not sure that I'd bother renaming
binary_upgrade_set_pg_class_oids_and_relfilenodes(). It's such a long
name, and a relfilenode is kind of an OID, so the current name isn't
even really wrong. I'd probably drop the header comment too, since it
seems rather obvious. But both of these things are judgement calls.

- Inside that function, there is a comment that says "Indexes cannot
have toast tables, so we need not make this probe in the index code
path." However, you have moved the code from someplace where it didn't
happen for indexes to someplace where it happens for both tables and
indexes. Therefore the comment, which was true when the code was where
it was before, is now false. So you need to update it.

- It is not clear to me why pg_upgrade's Makefile needs to be changed
to include -DFRONTEND in CPPFLAGS. All of the .c files in this
directory include postgres_fe.h rather than postgres.h, and that file
has #define FRONTEND 1. Moreover, there are no actual code changes in
this directory, so why should the Makefile need any change?

- A couple of comment changes - and the commit message - mention data
encryption, but that's not a feature that this patch implements, nor
are we committed to adding it in the immediate future (or ever,
really). So I think those places should be revised to say that we do
this because we want the filenames to match between the old and new
clusters, and leave the reasons why that might be a good thing up to
the reader's imagination.

Thanks,

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-08-17 16:12:54 Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Previous Message Jack Christensen 2021-08-17 15:41:00 Re: Allow composite foreign keys to reference a superset of unique constraint columns?