Re: Test instability when pg_dump orders by OID

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Test instability when pg_dump orders by OID
Date: 2025-07-17 13:24:02
Message-ID: CA+TgmoYnAiypU5sYstJ0nCqr+VVubf5wczANg=KqEeOYmdVeAw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 7, 2025 at 3:27 PM Noah Misch <noah(at)leadboat(dot)com> wrote:
> Let's get rid of pg_dump's need to sort by OID, apart from catalog corruption
> scenarios.

+1. I had at one point believed that sorting by OID was a good way to
make dumps stable, but this disproves that theory. Sorting by logical
properties of the object is better.

> Adding an assert found a total of seven affected object types.
> See the second attached patch. The drawback is storing five more fields in
> pg_dump memory: oprleft, oprright, opcmethod, opfmethod, and collencoding.
> That seems minor relative to existing pg_dump memory efficiency. Since this
> is a source of test flakes in v18, I'd like to back-patch to v18. I'm not
> sure why the buildfarm hasn't seen the above diff, but I expect the diff could
> happen there. This is another nice win for the new test from commit
> 172259afb563d35001410dc6daad78b250924038. The order instability was always
> bad for users, but the test brought it to the forefront. One might argue for
> back-patching $SUBJECT further, too.

I agree with back-patching it at least as far as v18. I think it
probably wouldn't hurt anything to back-patch further, and it might
avoid future buildfarm failures. Against that, there's a remote
possibility that someone who is currently saving pg_dump output for
later comparison, say in a case where OIDs are always stable in
practice, could be displeased to see the pg_dump order change in a
minor release. But that seems like a very weak argument against
back-patching. I can't see us ever deciding to put up with buildfarm
instability on such grounds.

Reviewing:

+ * Sort by name. This differs from "Name:" in plain format output, which
+ * is a _tocEntry.tag. For example, DumpableObject.name of a constraint
+ * is pg_constraint.conname, but _tocEntry.tag of a constraint is relname
+ * and conname joined with a space.

This comment is useful, but if I were to be critical, it does a better
job saying what this field isn't than what it is.

+ * Sort by encoding, per pg_collation_name_enc_nsp_index. This is
+ * mostly academic, because CREATE COLLATION has restrictions to make
+ * (nspname, collname) uniquely identify a collation within a given
+ * DatabaseEncoding. pg_import_system_collations() bypasses those
+ * restrictions, but pg_dump+restore fails after a
+ * pg_import_system_collations('my_schema') that creates collations
+ * for a blend of encodings.

This comment is also useful, but if I were to be critical again, it
does a better job saying why we shouldn't do what the code then does
than why we should.

Neither of those issues seem like must-fix problems to me.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Maksim.Melnikov 2025-07-17 13:36:10 Re: Add 64-bit XIDs into PostgreSQL 15
Previous Message Mahendra Singh Thalor 2025-07-17 12:52:38 Re: Non-text mode for pg_dumpall