From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: making relfilenodes 56 bits |
Date: | 2022-07-11 19:34:29 |
Message-ID: | 20220711193429.g45fzlf7gs6rjuon@awork3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2022-07-11 15:08:57 -0400, Robert Haas wrote:
> On Mon, Jul 11, 2022 at 2:57 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > I don't know where we could fit a sanity check that connects to all databases
> > and detects duplicates across all the pg_class instances. Perhaps pg_amcheck?
>
> Unless we're going to change the way CREATE DATABASE works, uniqueness
> across databases is not guaranteed.
You could likely address that by not flagging conflicts iff oid also matches?
Not sure if worth it, but ...
> > Maybe the easiest fix here would be to replace the file atomically. Then we
> > don't need this <= 512 byte stuff. These are done rarely enough that I don't
> > think the overhead of creating a separate file, fsyncing that, renaming,
> > fsyncing, would be a problem?
>
> Anything we can reasonably do to reduce the number of places where
> we're relying on things being <= 512 bytes seems like a step in the
> right direction to me. It's very difficult to know whether such code
> is correct, or what the probability is that crossing the 512-byte
> boundary would break anything.
Seems pretty simple to do. Have write_relmapper_file() write to a .tmp file
first (likely adding O_TRUNC to flags), use durable_rename() to rename it into
place. The tempfile should probably be written out before the XLogInsert(),
the durable_rename() after, although I think it'd also be correct to more
closely approximate the current sequence.
It's a lot more problematic to do this for the control file, because we can
end up updating that at a high frequency on standbys, due to minRecoveryPoint.
I have wondered about maintaining that in a dedicated file instead, and
perhaps even doing so on a primary.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2022-07-11 19:35:58 | Re: annoyance with .git-blame-ignore-revs |
Previous Message | Daniel Gustafsson | 2022-07-11 19:32:22 | Re: annoyance with .git-blame-ignore-revs |