Re: Bug in brin_minmax_multi_distance_numeric()

From: Peter Eisentraut <peter(at)eisentraut(dot)org>
To: Tomas Vondra <tomas(at)vondra(dot)me>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Bug in brin_minmax_multi_distance_numeric()
Date: 2025-08-05 17:24:07
Message-ID: d48b2a78-7ce0-46d8-b096-2ff2331aa819@eisentraut.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01.08.25 19:17, Tomas Vondra wrote:
>
>
> On 7/31/25 20:35, Tom Lane wrote:
>> Tomas Vondra <tomas(at)vondra(dot)me> writes:
>>> On 7/31/25 19:33, Tom Lane wrote:
>>>> ... It is certainly broken on
>>>> 32-bit machines where the Datum result of numeric_float8 will
>>>> be a pointer, so that we will convert the numeric pointer value
>>>> to a float and return that, yielding a totally-garbage distance
>>>> value. But I think it's broken on 64-bit machines too, because
>>>> we'll be interpreting the output of numeric_float8 as a uintptr_t
>>>> and applying some unwanted conversion to make that a float.
>>
>>> Agreed it's a bug on 32-bit machines. Not sure about 64-bits.
>>
>> Yeah, I'm not 100% sure about that. It's certainly doing something
>> unexpected, but we might accidentally end up with relatively-sane
>> relative distance comparisons anyway. (I assume the outputs will
>> only be compared to other outputs of the same function, right?)
>
>
> Yes. The index accumulates values, sorts them, calculates distance
> between the points, and them "merges" the closest ones. So it only
> compares results of the same function.
>
>> I have a vague recollection that the IEEE float format was chosen with
>> an eye to making comparisons cheap, ie not too much different from
>> integer comparisons. So the sort order might be about the same
>> even after incorrectly reinterpreting the bit-pattern as an int.
>> NaNs probably mess that up, but they would anyway.
>>
>>> Actually, no - it should not cause "broken" indexes, as in "giving
>>> incorrect results". The distance functions determine in what order we
>>> merge points into ranges, and if the distances are bogus then we can
>>> build a summary that is less efficient.
>>
>> Got it. So it might be worth reindexing such indexes after the
>> fix, but it's not strictly necessary.

Do we want to make a separate commit for this issue that can be
backpatched and have some user-facing information attached to it?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikhil Kumar Veldanda 2025-08-05 17:26:03 Re: Dead code with short varlenas in toast_save_datum()
Previous Message Peter Eisentraut 2025-08-05 17:20:20 pgaio_io_get_id() type (was Re: Datum as struct)