Re: storing an explicit nonce

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Tom Kincaid <tomjohnkincaid(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Subject: Re: storing an explicit nonce
Date: 2021-06-01 14:09:49
Message-ID: 20210601140949.GC22012@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 31, 2021 at 04:16:52PM -0400, Stephen Frost wrote:
> Greetings,
>
> * Andres Freund (andres(at)anarazel(dot)de) wrote:
> > On 2021-05-27 17:00:23 -0400, Bruce Momjian wrote:
> > > If you go in that direction, you should make sure pg_upgrade preserves
> > > what you use (it does not preserve relfilenode, just pg_class.oid)
> >
> > Is there a reason for pg_upgrade not to maintain relfilenode, aside from
> > implementation simplicity (which is a good reason!). The fact that the old and
> > new clusters have different relfilenodes does make inspecting some things a
> > bit harder.
>
> This was discussed for a bit during the Unconference (though it was
> related to backups and major upgrades which involves replicas) and the
> general consensus seemed to be that, no, it wasn't for any specific
> reason beyond that pg_upgrade didn't need to preserve relfilenode and
> therefore didn't.

Yes, David Steele wanted it so incremental backups after pg_upgrade were
smaller, which makes sense.

> There was a discussion around if there were possibly any pitfalls that
> we might run into, should we try to have pg_upgrade preserve
> relfilenodes but I don't *think* there were any actual show stoppers
> that came up. The simplest approach, I would think, would be to have it
> do the same thing that it does for OIDs today- basically have pg_dump in
> binary mode emit a function call to inform the backend of what
> relfilenode to use for the next CREATE statement. We would need to also
> pass into that function if the table should have a TOAST table and what
> the relfilenode for that should be too, for the base table. We'd need
> to also handle indexes, mat views, etc, of course.

Yes, exactly. The pg_upgrade.c paragraph says:

* We control all assignments of pg_class.oid (and relfilenode) so toast
* oids are the same between old and new clusters. This is important
* because toast oids are stored as toast pointers in user tables.
*
* While pg_class.oid and pg_class.relfilenode are initially the same
* in a cluster, they can diverge due to CLUSTER, REINDEX, or VACUUM
* FULL. In the new cluster, pg_class.oid and pg_class.relfilenode will
* be the same and will match the old pg_class.oid value. Because of
* this, old/new pg_class.relfilenode values will not match if CLUSTER,
* REINDEX, or VACUUM FULL have been performed in the old cluster.

One tricky case is pg_largeobject, which is copied from the old to new
cluster since it has user data. To preserve that relfilenode, you would
need to have pg_upgrade perform cluster surgery in each database to
renumber its relfilenode to match since it is created by initdb. I
can't think of a case where pg_upgrade already does something like that.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2021-06-01 14:30:42 Re: Decoding speculative insert with toast leaks memory
Previous Message Bharath Rupireddy 2021-06-01 14:01:51 Re: Alias collision in `refresh materialized view concurrently`