Re: making relfilenodes 56 bits

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: making relfilenodes 56 bits
Date: 2022-08-23 14:30:30
Message-ID: CA+Tgmoap-iDjSFX6BjvyJtchmwEWxrQ9c=_bv_PtSAobkJ8PTw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 23, 2022 at 2:06 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> OTOH, if we keep the two separate ranges for the user and system table
> then we don't need all this complex logic of conflict checking.

True. That's the downside. The question is whether it's worth adding
some complexity to avoid needing separate ranges.

Honestly, if we don't care about having separate ranges, we can do
something even simpler and just make the starting relfilenumber for
system tables same as the OID. Then we don't have to do anything at
all, outside of not changing the OID assigned to pg_largeobject in a
future release. Then as long as pg_upgrade is targeting a new cluster
with completely fresh databases that have not had any system table
rewrites so far, there can't be any conflict.

And perhaps that is the best solution after all, but while it is
simple in terms of code, I feel it's a bit complicated for human
beings. It's very simple to understand the scheme that Amit proposed:
if there's anything in the new cluster that would conflict, we move it
out of the way. We don't have to assume the new cluster hasn't had any
table rewrites. We don't have to nail down starting relfilenumber
assignments for system tables. We don't have to worry about
relfilenumber or OID assignments changing between releases.
pg_largeobject is not a special case. There are no special ranges of
OIDs or relfilenumbers required. It just straight up works -- all the
time, no matter what, end of story.

The other schemes we're talking about here all require a bunch of
assumptions about stuff like what I just mentioned. We can certainly
do it that way, and maybe it's even for the best. But I feel like it's
a little bit fragile. Maybe some future change gets blocked because it
would break one of the assumptions that the system relies on, or maybe
someone doesn't even realize there's an issue and changes something
that introduces a bug into this system. Or on the other hand maybe
not. But I think there's at least some value in considering whether
adding a little more code might actually make things simpler to reason
about, and whether that might be a good enough reason to do it.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jonathan S. Katz 2022-08-23 14:47:28 Re: SQL/JSON features for v15
Previous Message talk to ben 2022-08-23 14:18:52 Re: archive modules