Re: making relfilenodes 56 bits

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: making relfilenodes 56 bits
Date: 2022-07-11 19:34:29
Message-ID: 20220711193429.g45fzlf7gs6rjuon@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-07-11 15:08:57 -0400, Robert Haas wrote:
> On Mon, Jul 11, 2022 at 2:57 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > I don't know where we could fit a sanity check that connects to all databases
> > and detects duplicates across all the pg_class instances. Perhaps pg_amcheck?
>
> Unless we're going to change the way CREATE DATABASE works, uniqueness
> across databases is not guaranteed.

You could likely address that by not flagging conflicts iff oid also matches?
Not sure if worth it, but ...

> > Maybe the easiest fix here would be to replace the file atomically. Then we
> > don't need this <= 512 byte stuff. These are done rarely enough that I don't
> > think the overhead of creating a separate file, fsyncing that, renaming,
> > fsyncing, would be a problem?
>
> Anything we can reasonably do to reduce the number of places where
> we're relying on things being <= 512 bytes seems like a step in the
> right direction to me. It's very difficult to know whether such code
> is correct, or what the probability is that crossing the 512-byte
> boundary would break anything.

Seems pretty simple to do. Have write_relmapper_file() write to a .tmp file
first (likely adding O_TRUNC to flags), use durable_rename() to rename it into
place. The tempfile should probably be written out before the XLogInsert(),
the durable_rename() after, although I think it'd also be correct to more
closely approximate the current sequence.

It's a lot more problematic to do this for the control file, because we can
end up updating that at a high frequency on standbys, due to minRecoveryPoint.

I have wondered about maintaining that in a dedicated file instead, and
perhaps even doing so on a primary.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-07-11 19:35:58 Re: annoyance with .git-blame-ignore-revs
Previous Message Daniel Gustafsson 2022-07-11 19:32:22 Re: annoyance with .git-blame-ignore-revs