Re: Bug in brin_minmax_multi_distance_numeric()

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter(at)eisentraut(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Bug in brin_minmax_multi_distance_numeric()
Date: 2025-08-01 17:17:04
Message-ID: dfa0e881-18c4-46f8-91fe-53979891e804@vondra.me
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/31/25 20:35, Tom Lane wrote:
> Tomas Vondra <tomas(at)vondra(dot)me> writes:
>> On 7/31/25 19:33, Tom Lane wrote:
>>> ... It is certainly broken on
>>> 32-bit machines where the Datum result of numeric_float8 will
>>> be a pointer, so that we will convert the numeric pointer value
>>> to a float and return that, yielding a totally-garbage distance
>>> value. But I think it's broken on 64-bit machines too, because
>>> we'll be interpreting the output of numeric_float8 as a uintptr_t
>>> and applying some unwanted conversion to make that a float.
>
>> Agreed it's a bug on 32-bit machines. Not sure about 64-bits.
>
> Yeah, I'm not 100% sure about that. It's certainly doing something
> unexpected, but we might accidentally end up with relatively-sane
> relative distance comparisons anyway. (I assume the outputs will
> only be compared to other outputs of the same function, right?)

Yes. The index accumulates values, sorts them, calculates distance
between the points, and them "merges" the closest ones. So it only
compares results of the same function.

> I have a vague recollection that the IEEE float format was chosen with
> an eye to making comparisons cheap, ie not too much different from
> integer comparisons. So the sort order might be about the same
> even after incorrectly reinterpreting the bit-pattern as an int.
> NaNs probably mess that up, but they would anyway.
>
>> Actually, no - it should not cause "broken" indexes, as in "giving
>> incorrect results". The distance functions determine in what order we
>> merge points into ranges, and if the distances are bogus then we can
>> build a summary that is less efficient.
>
> Got it. So it might be worth reindexing such indexes after the
> fix, but it's not strictly necessary.
>

Exactly.

regards

--
Tomas Vondra

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Sami Imseih 2025-08-01 18:15:24 Re: Improve LWLock tranche name visibility across backends
Previous Message Masahiko Sawada 2025-08-01 17:00:48 Re: Fix a typo of comments in AutoVacLauncherMain