Re: Making type Datum be 8 bytes everywhere

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Peter Eisentraut <peter(at)eisentraut(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Making type Datum be 8 bytes everywhere
Date: 2025-08-09 01:14:08
Message-ID: 2180228.1754702048@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I have just realized that this proposal has a rather nasty defect.
Per the following comment in spgist_private.h:

* If the prefix datum is of a pass-by-value type, it is stored in its
* Datum representation, that is its on-disk representation is of length
* sizeof(Datum). This is a fairly unfortunate choice, because in no other
* place does Postgres use Datum as an on-disk representation; it creates
* an unnecessary incompatibility between 32-bit and 64-bit builds. But the
* compatibility loss is mostly theoretical since MAXIMUM_ALIGNOF typically
* differs between such builds, too. Anyway we're stuck with it now.

This means we cannot change sizeof(Datum), nor reconsider the
pass-by-value classification of any datatype, without potentially
breaking pg_upgrade of some SP-GiST indexes on 32-bit machines.

Now, it looks like this doesn't affect any in-core SP-GiST opclasses.
The only one using a potentially affected type is kd_point_ops which
uses a float8 prefix. That'll have been stored in regular on-disk
format on a 32-bit machine, but if we redefine it as being stored
in 64-bit-Datum format, nothing actually changes. The case that
would be problematic is a prefix type that's 4 bytes or less, and
I don't see any.

A quick search of Debian Code Search doesn't find any extensions
that look like they are using small pass-by-value prefixes either.
So maybe we can get away with just changing this, but it's worrisome.

On the positive side, even if there are any SP-GiST opclasses that
are at risk, the population of installations using them on 32-bit
installs has got to be pretty tiny. And the worst-case answer is
that you'd have to reindex such indexes after pg_upgrade.

BTW, I don't think we can teach pg_upgrade to check for this
hazard, because the SP-GiST APIs are such that the data type
used for prefixes isn't visible at the SQL level.

Do we think that making this change is valuable enough to justify
taking such a risk?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2025-08-09 01:32:21 Re: Eager aggregation, take 3
Previous Message Andres Freund 2025-08-09 00:25:40 Re: Adding basic NUMA awareness